UPD March 31st, 2020: In order to results reproduction in the case of
RSRandRSR+RAexperiments, it is important to consider only those opinions that could be fitted in a single context. Please proceed the Evaluation section in this description for details.
UPD December 3rd, 2019:
corefolder nowadays is an AREkit framework, and ranlp-2019 branch of the latter represents the same code included into this repository.
This repository provides a source code of models and related experiments, conducted as a part of the following paper:
- Distant Supervision for Sentiment Attitude Extraction
/
[paper-aclweb],
[poster]
- Rusnachenko Nicolay, Loukachevitch Natalia, Tutubalina Elena
- RANLP-2019
-
RuSentRel-1.1 -- utilized in experiments as a source of etalon attitudes;
-
RuSentiFrames-1.0 -- utilized as a list of frame-entries in Frame-Based approach;
-
RuAttitudes-1.0 -- is a colection, developed in related paper.
Checkout networks folder for implementation details.
Architecture aspects of models are as follows:
-
We utilize Position Feature (PF) -- is an embedding of distance between a given word towards each entity pair This feature has been originaly proposed in [1];
-
We apply and implement the related architecture dubbed as Piecewise Convolutional Neural Network (PCNN) [2];
-
This repository provides an implementation of the related architecture (Figure 2 above) in a way of a framework that allows to train models by matching a context (group of sentences) towards sentiment label. It assumes to utilize different sentence encoders: CNN, PCNN, etc. [3].
-
Relation Classification via Convolutional Deep Neural Network [paper] [code] [review]
- Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou and Jun Zhao
- COLING 2014
-
Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks [paper] [review] [code]
- Daojian Zeng, Kang Liu, Yubo Chen and Jun Zhao
- EMNLP 2015
-
Relation Extraction with Multi-instance Multi-label Convolutional Neural Networks [paper] [review]
- Xiaotian Jiang, Quan Wang, Peng Li, Bin Wang
- COLING 2016
We utilize two different approaches:
-
Single Sentence Training: matching label towards a single sentence.
- Models:
CNN,PCNN
- Models:
-
Multi Sentence Training: matching a label towards the sentences set.
- Models:
MI-CNN,MI-PCNN
- Models:
Named entities, which are related to Subject and Object of a given attitude, considered as masked. [proof]
For example, given an attitude 'USA'->'Russia' with the following context:
- Original: "... [USA] is considering the possibility of new sanctions against [Russia] ..."
- Masked: "... [MASK] is considering the possibility of new sanctions against [MASK] ..."
NOTE: Other named entities, mentioned in text, remains non-masked.
The output of trained model is a two scale sentiment score: {pos, neg}. [see Note below]
NOTE: We adopt model designed for sentiment attitudes extraction, which actually utilize three scaled output {pos, neg, neutral}, but there were no attitudes with neutral score during in train/test subsets; result model may return neutral score, however the amount of related attitudes was significantly low.
Evaluation results, described in paper [download]
IMPORTANT: Evaluation assumes to treat only those pairs which could be found in text as within a single sentence,
and also within a context, limited by TermsPerContext.
In turns, attitude labeling in RuSentRel include ones that could not be fitted and described by a limited context.
Utilizing the latter in evaluation process, results in a decreased Recall and hence in decreased results by F1.
For words we utilize a pretrained Word2Vec model:
RusVectores news-embedding (vector_size=1000) [download]
Figure below illustrates embedding parameters distribution for each POS class of first 1000 (out of 147k) words in embedding vocabulary. It is possible to see that parameters has a normal distribution.
NOTE: To generate token embedding vectors , we utilize
numpy.random_samplein [0, 1] region.
NOTE: To generate feature embedding vectors , we utilize
tf.random_normal_initializer.
UPD September 29th, 2019: It was found that hiddent states may vary out of [-1, 1] region (see heatmap gif image below). In case of CNN,
tf.random_normalcall for initialization was used. Large values may negatively results on gradient.
Figure below illustrates a variation in cost during model training process, in two experiments:
RSR and RSR+RA(RSR+DEV in figure).
An application of RuAttitudes in RSR+RA
experiment results in faster training process,
in comparison with RSR experiment:
UPD October 6th, 2019: Weights initialization of cnn-based models are
random_normal. Switching toxavierresults in increased training speed, since weights become in a [-1, 1] region by the beginning. However the latter does not affects on result.
UPD August 1st, 2019: The assumption here is a significantly increased scale of training set in case of
RSR+RAexperiment.
Analysis of difference set between RSR and RSR+RA results in:
-
List of 25 most frequent verbs (V), nonuns (N), frames (F), opinions(O), separately for positive (+) and negative (-) classes.
-
Related statistic combined with frequencies of related verbs/nouns/frames in RuAttitudes.
Bar candles illustrate a variation of frequencies across different models:
CNN,PCNN,MI-CNN,MI-PCNN.
NOTE: In figure above, X axis indices related to terms, presented in List, or figure below
All the related resources were used in experiment presented in data folder.
It is necessary to unpack and download (news embedding), as follows:
cd data && ./install.sh
@inproceedings{rusnachenko-etal-2019-distant,
title = "Distant Supervision for Sentiment Attitude Extraction",
author = "Rusnachenko, Nicolay and
Loukachevitch, Natalia and
Tutubalina, Elena",
booktitle = "Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)",
month = sep,
year = "2019",
address = "Varna, Bulgaria",
publisher = "INCOMA Ltd.",
url = "https://www.aclweb.org/anthology/R19-1118",
doi = "10.26615/978-954-452-056-4_118",
pages = "1022--1030"
}




