Automatic detection of fake tweets about the COVID-19 Vaccine in Portuguese

被引:1
作者
Geurgas, Rafael [1 ]
Tessler, Leandro R. [1 ]
机构
[1] Univ Estadual Campinas, IFGW, BR-13083970 Campinas, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
Disinformation; COVID-19; Neural networks; Automatic classification;
D O I
10.1007/s13278-024-01216-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The COVID-19 pandemic induced an unprecedented wave of disinformation in social media in Brazil. In particular, Twitter (currently X) was used to spread fake news about COVID-19 vaccines that helped to induce vaccine hesitation. This article presents a BERT-based neural network for the automatic detection of fake tweets. The optimized architecture relies upon BERTimbau, a BERT implementation pre-trained in Brazilian Portuguese, fine-tuned using three fully connected layers. All 2,857,908 tweets in Portuguese containing the word vacina (vaccine in Portuguese) were collected over 7 months. A random subset of 16,731 tweets was manually classified as real or fake. Of these, 2309 were discarded for not being about non-COVID-19 vaccines and 422 were discarded for containing irony. Of the remaining 14,000 tweets, 1144 were labeled fake and 12,856 were real. To balance the training dataset, the network was fine-tuned using the 1144 curated fake tweets and a random sample of 2000 real tweets. Optimal results were achieved by melting the last four layers of the BERTimbau. The best results obtained were 77.1% F1-score and 76.9% accuracy. These results are already acceptable for practical applications. They can be improved by increasing the size of the training dataset. A weighted 96.3% F1-score was obtained by training the same neural network architecture and hyperparameters with a larger curated balanced English language training dataset.
引用
收藏
页数:10
相关论文
共 31 条
[1]  
Ba J, 2014, ACS SYM SER
[2]   COVID-19 Misinformation Online and Health Literacy: A Brief Overview [J].
Bin Naeem, Salman ;
Boulos, Maged N. Kamel .
INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2021, 18 (15)
[3]  
Chakraborty T, 2021, Revised Selected Papers, V1402, DOI [10.1007/978-3-030-73696-5, DOI 10.1007/978-3-030-73696-5]
[4]  
Devlin J, 2019, Bert multilingual model
[5]  
Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, DOI 10.48550/ARXIV.1810.04805]
[6]   The psychological drivers of misinformation belief and its resistance to correction [J].
Ecker, Ullrich K. H. ;
Lewandowsky, Stephan ;
Cook, John ;
Schmid, Philipp ;
Fazio, Lisa K. ;
Brashier, Nadia ;
Kendeou, Panayiota ;
Vraga, Emily K. ;
Amazeen, Michelle A. .
NATURE REVIEWS PSYCHOLOGY, 2022, 1 (01) :13-29
[7]  
Eddy K, 2023, Reuters institute digital news report 2023
[8]   Illusion of Truth: Analysing and Classifying COVID-19 Fake News in Brazilian Portuguese Language [J].
Endo, Patricia Takako ;
Santos, Guto Leoni ;
de Lima Xavier, Maria Eduarda ;
Nascimento Campos, Gleyson Rhuan ;
de Lima, Luciana Conceicao ;
Silva, Ivanovitch ;
Egli, Antonia ;
Lynn, Theo .
BIG DATA AND COGNITIVE COMPUTING, 2022, 6 (02)
[9]   Identifying Fake News in Brazilian Portuguese [J].
Fischer, Marcelo ;
Haque, Rejwanul ;
Stynes, Paul ;
Pathak, Pramod .
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 :111-118
[10]   Detection of Misinformation About COVID-19 in Brazilian Portuguese WhatsApp Messages [J].
Forte Martins, Antonio Diogo ;
Cabral, Lucas ;
Chaves Mourao, Pedro Jorge ;
Monteiro, Jose Maria ;
Machado, Javam .
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021), 2021, 12801 :199-206