Leveraging Emoji to Improve Sentiment Classification of Tweets

被引:5
作者
de Barros, Tiago Martinho [1 ]
Pedrini, Helio [1 ]
Dias, Zanoni [1 ]
机构
[1] Univ Estadual Campinas, Inst Comp, Campinas, SP, Brazil
来源
36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021 | 2021年
基金
巴西圣保罗研究基金会;
关键词
Natural Language Processing; Sentiment Analysis; Emoji; Social Media;
D O I
10.1145/3412841.3441960
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent advances in the Natural Language Processing field have brought good results to a number of interesting tasks, for instance, Linguistic Acceptability, Question Answering, Reading Comprehension, Natural Language Inference, and Sentiment Analysis. Methods, such as ULMFiT, ELMo, BERT, and their derivatives, have achieved increasing success with these tasks, but often requiring substantial amounts of pre-training data and computational resources. We propose a novel methodology to classify the sentiment of tweets, based on BERT but focusing on emojis, treating them as an important source of sentiment as opposed to considering them simple input tokens. Additionally, it is possible to use a previously pre-trained BERT model to warm start ours, greatly reducing the training time required. Experiments on two Brazilian Portuguese datasets - TweetSentBR and 2000-tweets-BR - show that our methodology produces better results than BERT and outperforms the previously published results for both datasets, thus establishing new state-of-the-art results on TweetSentBR with accuracy of 0.7577 (4.8 percentage points absolute improvement) and F-1 score of 0.7395 (8.4 percentage points absolute improvement); and on 2000-tweets-BR with accuracy of 0.8316 (15.2 percentage points absolute improvement) and F-1 score of 0.8151 (24.5 percentage points absolute improvement).
引用
收藏
页码:845 / 852
页数:8
相关论文
共 31 条
[1]  
Brum HB, 2018, PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), P4167
[2]   Semi-supervised Sentiment Annotation of Large Corpora [J].
Brum, Henrico Bertini ;
Volpe Nunes, Maria das Gracas .
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 :385-395
[3]  
Chang S., 2016, P 10 INT C WEB SOC M, P259, DOI [DOI 10.1609/ICWSM.V10I1.14757, 10.1609/icwsm.v10i1.14757Number:1]
[4]  
Dai AM, 2015, ADV NEUR IN, V28
[5]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6]  
Go Alec, 2009, CS224N PROJECT REPOR, V1
[7]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[8]  
Howard J, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P328
[9]  
Kobayashi N., 2007, Information and Media Technologies, V2, P326
[10]  
Liu B., 2012, Sentiment analysis and opinion mining, DOI [10.1007/978-3-031-02145-9, DOI 10.1007/978-3-031-02145-9]