CMTA: COVID-19 Misinformation Multilingual Analysis on Twitter

被引:0
作者
Pranesh, Raj Ratn [1 ]
Farokhnejad, Mehrdad [2 ]
Shekhar, Ambesh [1 ]
Vargas-Solar, Genoveva [3 ]
机构
[1] Birla Inst Technol, Mesra, India
[2] Univ Grenoble Alpes, LIG, CNRS, Grenoble, France
[3] LIRIS LAFMIA Lyon, CNRS, Lyon, France
来源
ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP | 2021年
关键词
HEALTH INFORMATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The internet has actually come to be an essential resource of health knowledge for individuals around the world in the present situation of the coronavirus condition pandemic(COVID-19). During pandemic situations, myths, sensationalism, rumours and misinformation, generated intentionally or unintentionally, spread rapidly through social networks. Twitter is one of these popular social networks people use to share COVID-19 related news, information, and thoughts that reflect their perception and opinion about the pandemic. Evaluation of tweets for recognizing misinformation can create beneficial understanding to review the top quality and also the readability of online information concerning the COVID-19. This paper presents a multilingual COVID-19 related tweet analysis method, CMTA, that uses BERT, a deep learning model for multilingual tweet misinformation detection and classification. CMTA extracts features from multilingual textual data, which is then categorized into specific information classes. Classification is done by a Dense-CNN model trained on tweets manually annotated into information classes (i.e., 'false', 'partly false', 'misleading'). The paper presents an analysis of multilingual tweets from February to June, showing the distribution type of information spread across different languages. To access the performance of the CMTA multilingual model, we performed a comparative analysis of 8 monolingual model and CMTA for the misinformation detection task. The results show that our proposed CMTA model has surpassed various monolingual models which consolidated the fact that through transfer learning a multilingual framework could be developed.
引用
收藏
页码:270 / 283
页数:14
相关论文
共 34 条
[1]  
Alam F., 2020, FIGHTING COVID 19 IN
[2]  
[Anonymous], 2019, BERT pretrained model trained on Japanese Wikipedia articles
[3]  
Brennen J.S., 2020, Factsheet, V7, P1
[4]  
Brindha M. D., 2020, Social media reigned by information or misinformation about COVID-19: a phenomenological study
[5]  
Chaovavanich Korakot, 2016, PYTHAINLP THAI NATUR
[6]  
Chen Emily, 2020, JMIR Public Health Surveill, V6, pe19273, DOI 10.2196/19273
[7]   The COVID-19 social media infodemic [J].
Cinelli, Matteo ;
Quattrociocchi, Walter ;
Galeazzi, Alessandro ;
Valensise, Carlo Michele ;
Brugnoli, Emanuele ;
Schmidt, Ana Lucia ;
Zola, Paola ;
Zollo, Fabiana ;
Scala, Antonio .
SCIENTIFIC REPORTS, 2020, 10 (01)
[8]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9]  
Dharawat A., 2020, DRINK BLEACH WHAT NO
[10]   Empirical studies assessing the quality of health information for consumers on the World Wide Web - A systematic review [J].
Eysenbach, G ;
Powell, J ;
Kuss, O ;
Sa, ER .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2002, 287 (20) :2691-2700