Adapting recurrent neural networks for classifying public discourse on COVID-19 symptoms in Twitter content

被引:7
作者
Amin, Samina [1 ]
Alharbi, Abdullah [2 ]
Uddin, M. Irfan [1 ]
Alyami, Hashem [3 ]
机构
[1] Kohat Univ Sci & Technol, Inst Comp, Kohat 2600, Pakistan
[2] Taif Univ, Coll Comp & Informat Technol, Dept Informat Technol, POB 11099, Taif 21944, Saudi Arabia
[3] Taif Univ, Coll Comp & Informat Technol, Dept Comp Sci, POB 11099, Taif 21944, Saudi Arabia
关键词
Deep learning; Coronavirus; Pandemic; COVID-19; Classification; Recurrent neural networks; Twitter; TWEETS; CORONAVIRUS;
D O I
10.1007/s00500-022-07405-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The COVID-19 infection, which began in December 2019, has claimed many lives and impacted all aspects of human life. With time, COVID-19 was identified as a pandemic outbreak by the World Health Organization (WHO), putting massive pressure on global health. During this ongoing pandemic, the exponential growth of social media platforms has provided valuable resources for distributing information, as well as a source for self-reported disease symptoms in public discourse. Therefore, there is an urgent need for effective approaches to detect self-reported symptoms or cases in social media content. In this study, we scrapped public discourse on COVID-19 symptoms in Twitter content. For this, we developed a huge dataset of COVID-19 self-reported symptoms and gold-annotated the tweets into four categories: confirmed, death, suspected, and recovered. Then, we use a machine and deep machine learning models, each with its own set of features, such as feature representation. Furthermore, the experimentations were achieved with recurrent neural networks (RNNs) variants and compared their performance with traditional machine learning algorithms. Experimental results report that optimizing the area under the curve (AUC) enhances model performance, and the long short-term memory (LSTM) has the highest accuracy in detecting COVID-19 symptoms in real-time public messaging. Thus, the LSTM classifier in the proposed pipeline achieves a classification accuracy of 90.7%, outperforming existing state-of-the-art algorithms for multi-class classification.
引用
收藏
页码:11077 / 11089
页数:13
相关论文
共 52 条
  • [41] Oliphant T. E., 2006, A GUIDE TO NUMPY, VVol. 1
  • [42] Ebola, Twitter, and misinformation: a dangerous combination?
    Oyeyemi, Sunday Oluwafemi
    Gabarron, Elia
    Wynn, Rolf
    [J]. BMJ-BRITISH MEDICAL JOURNAL, 2014, 349
  • [43] Pedregosa F, 2011, J MACH LEARN RES, V12, P2825
  • [44] Social Networks' Engagement During the COVID-19 Pandemic in Spain: Health Media vs. Healthcare Professionals
    Perez-Escoda, Ana
    Jimenez-Narros, Carlos
    Perlado-Lamo-de-Espinosa, Marta
    Miguel Pedrero-Esteban, Luis
    [J]. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2020, 17 (14) : 1 - 17
  • [45] Powers D. M. W., 2020, ARXIV
  • [46] Ramos Juan., 2003, USING TF IDF DETERMI, V242, P29
  • [47] Role of social media during the COVID-19 pandemic: Beneficial, destructive, or reconstructive?
    Sahni, Heena
    Sharma, Hunny
    [J]. INTERNATIONAL JOURNAL OF ACADEMIC MEDICINE, 2020, 6 (02) : 70 - 75
  • [48] Violos J., 2018, Frontiers in Applied Mathematics and Statistics, V4, P41, DOI DOI 10.3389/FAMS.2018.00041
  • [49] Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China
    Wang, Dawei
    Hu, Bo
    Hu, Chang
    Zhu, Fangfang
    Liu, Xing
    Zhang, Jing
    Wang, Binbin
    Xiang, Hui
    Cheng, Zhenshun
    Xiong, Yong
    Zhao, Yan
    Li, Yirong
    Wang, Xinghuan
    Peng, Zhiyong
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2020, 323 (11): : 1061 - 1069
  • [50] Worldometers, 2004, COR