A Survey of Cross-lingual Sentiment Analysis: Methodologies, Models and Evaluations

被引：25

作者：

Xu, Yuemei ^{[1
]}

Cao, Han ^{[1
]}

Du, Wanze ^{[1
]}

Wang, Wenqing ^{[1
]}

机构：

[1] Beijing Foreign Studies Univ, Sch Informat Sci & Technol, Beijing 100089, Peoples R China

来源：

DATA SCIENCE AND ENGINEERING | 2022年 / 7卷 / 03期

关键词：

Cross-lingual; Sentiment analysis; Bilingual word embedding; CLASSIFICATION;

D O I：

10.1007/s41019-022-00187-3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cross-lingual sentiment analysis (CLSA) leverages one or several source languages to help the low-resource languages to perform sentiment analysis. Therefore, the problem of lack of annotated corpora in many non-English languages can be alleviated. Along with the development of economic globalization, CLSA has attracted much attention in the field of sentiment analysis and the last decade has seen a surge of researches in this area. Numerous methods, datasets and evaluation metrics have been proposed in the literature, raising the need for a comprehensive and updated survey. This paper fills the gap by reviewing the state-of-the-art CLSA approaches from 2004 to the present. This paper teases out the research context of cross-lingual sentiment analysis and elaborates the following methods in detail: (1) The early main methods of CLSA, including those based on Machine Translation and its improved variants, parallel corpora or bilingual sentiment lexicon; (2) CLSA based on cross-lingual word embedding; (3) CLSA based on multi-BERT and other pre-trained models. We further analyze their main ideas, methodologies, shortcomings, etc., and attempt to reach a conclusion on the coverage of languages, datasets and their performance. Finally, we look into the future development of CLSA and the challenges facing the research area.

引用

页码：279 / 299

页数：21

共 82 条

[1]

Abdalla M, 2017, P 8 INT JOINT C NATU, V1, P462

[2] Improving Word Embedding Coverage in Less-Resourced Languages Through Multi-Linguality and Cross-Linguality: A Case Study with Aspect-Based Sentiment Analysis [J].

Akhtar, Md Shad ;

Sawant, Palaash ;

Sen, Sukanta ;

Ekbal, Asif ;

Bhattacharyya, Pushpak .

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (02)

[3]

Al-Shabi A, 2017, INT J ADV COMPUT SC, V8, P434

[4]

Artetxe M, 2018, ROBUST SELF LEARNING

[5] Learning bilingual word embeddings with (almost) no bilingual data [J].

Artetxe, Mikel ;

Labaka, Gorka ;

Agirre, Eneko .

PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :451-462

[6]

Baccianella S, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION

[7] Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis [J].

Balahur, Alexandra ;

Turchi, Marco .

COMPUTER SPEECH AND LANGUAGE, 2014, 28 (01) :56-75

[8]

Banea C, 2010, P INT C COMP LING CO, P2836

[9]

Barbieri CJNLF, 2020, TWEETEVAL UNIED BENC

[10]

Barnes J, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2483

← 1 2 3 4 5 6 7 8 9 →