An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian

被引：79

作者：

Pota, Marco ^{[1
]}

Ventura, Mirko ^{[1
]}

Catelli, Rosario ^{[1
,2
]}

Esposito, Massimo ^{[1
]}

机构：

[1] CNR, Inst High Performance Comp & Networking ICAR, I-80131 Naples, Italy

[2] Univ Naples Federico II, Dept Elect Engn & Informat Technol DIETI, I-80125 Naples, Italy

来源：

SENSORS | 2021年 / 21卷 / 01期

关键词：

sentiment analysis; NLP; language models; BERT; Italian language; QUESTION CLASSIFICATION;

D O I：

10.3390/s21010133

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Over the last decade industrial and academic communities have increased their focus on sentiment analysis techniques, especially applied to tweets. State-of-the-art results have been recently achieved using language models trained from scratch on corpora made up exclusively of tweets, in order to better handle the Twitter jargon. This work aims to introduce a different approach for Twitter sentiment analysis based on two steps. Firstly, the tweet jargon, including emojis and emoticons, is transformed into plain text, exploiting procedures that are language-independent or easily applicable to different languages. Secondly, the resulting tweets are classified using the language model BERT, but pre-trained on plain text, instead of tweets, for two reasons: (1) pre-trained models on plain text are easily available in many languages, avoiding resource- and time-consuming model training directly on tweets from scratch; (2) available plain text corpora are larger than tweet-only ones, therefore allowing better performance. A case study describing the application of the approach to Italian is presented, with a comparison with other Italian existing solutions. The results obtained show the effectiveness of the approach and indicate that, thanks to its general basis from a methodological perspective, it can also be promising for other languages.

引用

页码：1 / 21

页数：21

共 92 条

[1] Unsupervised Emotion Detection from Text using Semantic and Syntactic Relations
Agrawal, Ameeta
An, Aijun
[J]. 2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1, 2012, : 346 - 353
[2] The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis
Alam, Saqib
Yao, Nianmin
[J]. COMPUTATIONAL AND MATHEMATICAL ORGANIZATION THEORY, 2019, 25 (03) : 319 - 335
[3] A Combined CNN and LSTM Model for Arabic Sentiment Analysis
Alayba, Abdulaziz M.
Palade, Vasile
England, Matthew
Iqbal, Rahat
[J]. MACHINE LEARNING AND KNOWLEDGE EXTRACTION, CD-MAKE 2018, 2018, 11015 : 179 - 191
[4] An intelligent healthcare monitoring framework using wearable sensors and social networking data
Ali, Farman
El-Sappagh, Shaker
Islam, S. M. Riazul
Ali, Amjad
Attique, Muhammad
Imran, Muhammad
Kwak, Kyung-Sup
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 114 : 23 - 43
[5] Transportation sentiment analysis using word embedding and ontology-based topic modeling
Ali, Farman
Kwak, Daehan
Khan, Pervez
El-Sappagh, Shaker
Ali, Amjad
Ullah, Sana
Kim, Kye Hyun
Kwak, Kyung-Sup
[J]. KNOWLEDGE-BASED SYSTEMS, 2019, 174 : 27 - 42
[6] Angiani G., 2016, P 2 INT WORKSH KNOWL, V1748
[7] Enhancing deep learning sentiment analysis with ensemble techniques in social applications
Araque, Oscar
Corcuera-Platas, Ignacio
Sanchez-Rada, J. Fernando
Iglesias, Carlos A.
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 77 : 236 - 246
[8] Attardi G., 2016, P 3 IT C COMP LING C, V1749
[9] TwitterBERT: Framework for Twitter Sentiment Analysis Based on Pre-trained Language Model Representations
Azzouza, Noureddine
Akli-Astouati, Karima
Ibrahim, Roliana
[J]. EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 428 - 437
[10] Babanejad N., 2020, P 58 ANN M ASS COMP, P5799

← 1 2 3 4 5 6 7 8 9 10 →