Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification

被引:72
作者
Deriu, Jan [1 ]
Lucchi, Aurelien [2 ]
De Luca, Valeria [2 ]
Severyn, Aliaksei [3 ]
Muller, Simon [4 ]
Cieliebak, Mark [4 ]
Hofmann, Thomas [2 ]
Jaggi, Martin [5 ]
机构
[1] ZHAW, Zurich, Switzerland
[2] Swiss Fed Inst Technol, Zurich, Switzerland
[3] Google Res, Zurich, Switzerland
[4] SpinningBytes AG, Kusnacht, Switzerland
[5] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
来源
PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'17) | 2017年
关键词
Sentiment classification; multi-language; weak supervision; neural networks;
D O I
10.1145/3038912.3052611
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel approach for multi-lingual sentiment classification in short texts. This is a challenging task as the amount of training data in languages other than English is very limited. Previously proposed multi-lingual approaches typically require to establish a correspondence to English for which powerful classifiers are already available. In contrast, our method does not require such supervision. We leverage large amounts of weakly-supervised data in various languages to train a multi-layer convolutional network and demonstrate the importance of using pretraining of such networks. We thoroughly evaluate our approach on various multi-lingual datasets, including the recent SemEval-2016 sentiment prediction benchmark (Task 4), where we achieved state-of-the-art performance. We also compare the performance of our model trained individually for each language to a variant trained for all languages at once. We show that the latter model reaches slightly worse - but still acceptable - performance when compared to the single language model, while benefiting from better generalization properties across languages.
引用
收藏
页码:1045 / 1052
页数:8
相关论文
共 32 条
[1]  
[Anonymous], 2011, P 5 INT JOINT C NATU
[2]  
[Anonymous], ACL
[3]  
[Anonymous], 2016, P 10 INT WORKSH SEM
[4]  
[Anonymous], 2016, P ACM S APPL COMP 20
[5]  
[Anonymous], 2005, Proceedings of the ACL student research workshop
[6]  
[Anonymous], 2009, TECHNICAL REPORT
[7]  
[Anonymous], 2014, P 23 ACM INT C C INF, DOI DOI 10.1145/2661829.2661935
[8]  
[Anonymous], 2014, P COLING 2014 25 INT, DOI DOI 10.1109/ICCAR.2017.7942788
[9]  
[Anonymous], ARXIV151201818
[10]  
[Anonymous], 2013, P 2013 C EMP METH NA