Automatic detection of fake tweets about the COVID-19 Vaccine in Portuguese

被引：1

作者：

Geurgas, Rafael ^{[1
]}

Tessler, Leandro R. ^{[1
]}

机构：

[1] Univ Estadual Campinas, IFGW, BR-13083970 Campinas, SP, Brazil

来源：

SOCIAL NETWORK ANALYSIS AND MINING | 2024年 / 14卷 / 01期

基金：

巴西圣保罗研究基金会;

关键词：

Disinformation; COVID-19; Neural networks; Automatic classification;

D O I：

10.1007/s13278-024-01216-x

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The COVID-19 pandemic induced an unprecedented wave of disinformation in social media in Brazil. In particular, Twitter (currently X) was used to spread fake news about COVID-19 vaccines that helped to induce vaccine hesitation. This article presents a BERT-based neural network for the automatic detection of fake tweets. The optimized architecture relies upon BERTimbau, a BERT implementation pre-trained in Brazilian Portuguese, fine-tuned using three fully connected layers. All 2,857,908 tweets in Portuguese containing the word vacina (vaccine in Portuguese) were collected over 7 months. A random subset of 16,731 tweets was manually classified as real or fake. Of these, 2309 were discarded for not being about non-COVID-19 vaccines and 422 were discarded for containing irony. Of the remaining 14,000 tweets, 1144 were labeled fake and 12,856 were real. To balance the training dataset, the network was fine-tuned using the 1144 curated fake tweets and a random sample of 2000 real tweets. Optimal results were achieved by melting the last four layers of the BERTimbau. The best results obtained were 77.1% F1-score and 76.9% accuracy. These results are already acceptable for practical applications. They can be improved by increasing the size of the training dataset. A weighted 96.3% F1-score was obtained by training the same neural network architecture and hyperparameters with a larger curated balanced English language training dataset.

引用

页数：10

共 31 条

[1]

Ba J, 2014, ACS SYM SER

[2] COVID-19 Misinformation Online and Health Literacy: A Brief Overview [J].

Bin Naeem, Salman ;

Boulos, Maged N. Kamel .

INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2021, 18 (15)

[3]

Chakraborty T, 2021, Revised Selected Papers, V1402, DOI [10.1007/978-3-030-73696-5, DOI 10.1007/978-3-030-73696-5]

[4]

Devlin J, 2019, Bert multilingual model

[5]

Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, DOI 10.48550/ARXIV.1810.04805]

[6] The psychological drivers of misinformation belief and its resistance to correction [J].

Ecker, Ullrich K. H. ;

Lewandowsky, Stephan ;

Cook, John ;

Schmid, Philipp ;

Fazio, Lisa K. ;

Brashier, Nadia ;

Kendeou, Panayiota ;

Vraga, Emily K. ;

Amazeen, Michelle A. .

NATURE REVIEWS PSYCHOLOGY, 2022, 1 (01) :13-29

[7]

Eddy K, 2023, Reuters institute digital news report 2023

[8] Illusion of Truth: Analysing and Classifying COVID-19 Fake News in Brazilian Portuguese Language [J].

Endo, Patricia Takako ;

Santos, Guto Leoni ;

de Lima Xavier, Maria Eduarda ;

Nascimento Campos, Gleyson Rhuan ;

de Lima, Luciana Conceicao ;

Silva, Ivanovitch ;

Egli, Antonia ;

Lynn, Theo .

BIG DATA AND COGNITIVE COMPUTING, 2022, 6 (02)

[9] Identifying Fake News in Brazilian Portuguese [J].

Fischer, Marcelo ;

Haque, Rejwanul ;

Stynes, Paul ;

Pathak, Pramod .

NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 :111-118

[10] Detection of Misinformation About COVID-19 in Brazilian Portuguese WhatsApp Messages [J].

Forte Martins, Antonio Diogo ;

Cabral, Lucas ;

Chaves Mourao, Pedro Jorge ;

Monteiro, Jose Maria ;

Machado, Javam .

NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021), 2021, 12801 :199-206

← 1 2 3 4 →