A comparative study of machine translation for multilingual sentence-level sentiment analysis

被引:66
作者
Araujo, Matheus [1 ,2 ]
Pereira, Adriano [2 ]
Benevenuto, Fabricio [2 ]
机构
[1] Univ Minnesota, Minneapolis, MN 55455 USA
[2] Univ Fed Minas Gerais, Belo Horizonte, MG, Brazil
关键词
Sentiment analysis; Multilingual; Machine translation; Online social networks; Opinion mining;
D O I
10.1016/j.ins.2019.10.031
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis has become a key tool for several social media applications, including, analysis of user's opinions about products and services, support for politics during campaigns and even identification of market trending. Multiple existing sentiment analysis methods explore different techniques, usually relying on lexical resources or learning approaches. Despite the significant interest in this theme and amount of research efforts in the field, almost all existing methods are designed to work with only English content. Most current strategies in other languages consist of adapting existing lexical resources, without presenting proper validations and basic baseline comparisons. In this work, we take a different step into this field. We focus on evaluating existing efforts proposed to do language specific sentiment analysis with a simple yet effective baseline approach. To do it, we evaluated sixteen methods for sentence-level sentiment analysis proposed for English, and compared them with three language-specific methods. Based on fourteen human labeled language-specific datasets, we provide an extensive quantitative analysis of existing multilingual approaches. Our results suggest that simply translating the input text in a specific language to English and then using one of the existing best methods developed for English can be better than the existing language-specific approach evaluated. We also rank methods according to their prediction performance and identify those that acquired the best results using machine translation across different languages. As a final contribution to the research community, we release our codes, datasets, and the iFeel 3.0 system, a Web framework and tool for multilingual sentence-level sentiment analysis'. We hope our system sets up a new baseline for future sentence-level methods developed in a wide set of languages. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:1078 / 1102
页数:25
相关论文
共 50 条
[1]  
[Anonymous], 2012, MULTILINGUAL SENTIME
[2]  
[Anonymous], DATA ANAL
[3]  
[Anonymous], 2012, LNCS
[4]  
[Anonymous], HEART SOUL SENTIMENT
[5]  
[Anonymous], 2013, THESIS
[6]  
[Anonymous], 1980, THEORIES EMOTION, DOI [10.1016/B978-0-12-558701-3.50007-7, DOI 10.1016/B978-0-12-558701-3.50007-7]
[7]  
ARAUJO M, 2016, P 10 INT AAAI C WEB, P758
[8]   iFeel: A Web System that Compares and Combines Sentiment Analysis Methods [J].
Araujo, Matheus ;
Goncalves, Pollyanna ;
Cha, Meeyoung ;
Benevenuto, Fabricio .
WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, :75-78
[9]  
Bader B. W., 2011, 2011 IEEE International Conference on Data Mining Workshops, P45, DOI 10.1109/ICDMW.2011.185
[10]  
Balahur, 2012, P 3 WORKSH COMP APPR, P52