A Survey of Cross-lingual Sentiment Analysis: Methodologies, Models and Evaluations

被引:25
作者
Xu, Yuemei [1 ]
Cao, Han [1 ]
Du, Wanze [1 ]
Wang, Wenqing [1 ]
机构
[1] Beijing Foreign Studies Univ, Sch Informat Sci & Technol, Beijing 100089, Peoples R China
关键词
Cross-lingual; Sentiment analysis; Bilingual word embedding; CLASSIFICATION;
D O I
10.1007/s41019-022-00187-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-lingual sentiment analysis (CLSA) leverages one or several source languages to help the low-resource languages to perform sentiment analysis. Therefore, the problem of lack of annotated corpora in many non-English languages can be alleviated. Along with the development of economic globalization, CLSA has attracted much attention in the field of sentiment analysis and the last decade has seen a surge of researches in this area. Numerous methods, datasets and evaluation metrics have been proposed in the literature, raising the need for a comprehensive and updated survey. This paper fills the gap by reviewing the state-of-the-art CLSA approaches from 2004 to the present. This paper teases out the research context of cross-lingual sentiment analysis and elaborates the following methods in detail: (1) The early main methods of CLSA, including those based on Machine Translation and its improved variants, parallel corpora or bilingual sentiment lexicon; (2) CLSA based on cross-lingual word embedding; (3) CLSA based on multi-BERT and other pre-trained models. We further analyze their main ideas, methodologies, shortcomings, etc., and attempt to reach a conclusion on the coverage of languages, datasets and their performance. Finally, we look into the future development of CLSA and the challenges facing the research area.
引用
收藏
页码:279 / 299
页数:21
相关论文
共 82 条
[1]  
Abdalla M, 2017, P 8 INT JOINT C NATU, V1, P462
[2]   Improving Word Embedding Coverage in Less-Resourced Languages Through Multi-Linguality and Cross-Linguality: A Case Study with Aspect-Based Sentiment Analysis [J].
Akhtar, Md Shad ;
Sawant, Palaash ;
Sen, Sukanta ;
Ekbal, Asif ;
Bhattacharyya, Pushpak .
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (02)
[3]  
Al-Shabi A, 2017, INT J ADV COMPUT SC, V8, P434
[4]  
Artetxe M, 2018, ROBUST SELF LEARNING
[5]   Learning bilingual word embeddings with (almost) no bilingual data [J].
Artetxe, Mikel ;
Labaka, Gorka ;
Agirre, Eneko .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :451-462
[6]  
Baccianella S, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
[7]   Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis [J].
Balahur, Alexandra ;
Turchi, Marco .
COMPUTER SPEECH AND LANGUAGE, 2014, 28 (01) :56-75
[8]  
Banea C, 2010, P INT C COMP LING CO, P2836
[9]  
Barbieri CJNLF, 2020, TWEETEVAL UNIED BENC
[10]  
Barnes J, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2483