An intelligent early warning system of analyzing Twitter data using machine learning on COVID-19 surveillance in the US

被引:30
作者
Zhang, Yiming [1 ,2 ]
Chen, Ke [1 ,2 ]
Weng, Ying [1 ]
Chen, Zhuo [3 ,4 ]
Zhang, Juntao [1 ,2 ]
Hubbard, Richard [2 ]
机构
[1] Univ Nottingham, Fac Sci & Engn, Sch Comp Sci, Ningbo, Peoples R China
[2] Univ Nottingham, Fac Med & Hlth Sci, Sch Med, Nottingham, England
[3] Univ Georgia, Dept Hlth Policy & Management, Athens, GA USA
[4] Univ Nottingham Ningbo China, Fac Humanities & Social Sci, Sch Econ, Ningbo, Peoples R China
关键词
COVID-19; surveillance; Early warning system; Text classification; BERT; Epidemic intelligence; OUTBREAK;
D O I
10.1016/j.eswa.2022.116882
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The World Health Organization (WHO) declared on 11th March 2020 the spread of the coronavirus disease 2019 (COVID-19) a pandemic. The traditional infectious disease surveillance had failed to alert public health authorities to intervene in time and mitigate and control the COVID-19 before it became a pandemic. Compared with traditional public health surveillance, harnessing the rich data from social media, including Twitter, has been considered a useful tool and can overcome the limitations of the traditional surveillance system. This paper proposes an intelligent COVID-19 early warning system using Twitter data with novel machine learning methods. We use the natural language processing (NLP) pre-training technique, i.e., fine-tuning BERT as a Twitter classification method. Moreover, we implement a COVID-19 forecasting model through a Twitter-based linear regression model to detect early signs of the COVID-19 outbreak. Furthermore, we develop an expert system, an early warning web application based on the proposed methods. The experimental results suggest that it is feasible to use Twitter data to provide COVID-19 surveillance and prediction in the US to support health departments' decision-making.
引用
收藏
页数:11
相关论文
共 34 条
[1]  
[Anonymous], 2017, Global Journal of Flexible Systems Management, DOI [10.1007/s40171-016-0148-y, DOI 10.1007/S40171-016-0148-Y]
[2]  
[Anonymous], 2015, International Journal Of Data Mining & Knowledge Management Process, S.L., V5, P01, DOI [10.5121/ijdkp.2015.5201, DOI 10.5121/IJDKP.2015.5201]
[3]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[4]   Syndromic surveillance of Flu on Twitter using weakly supervised temporal topic models [J].
Chen, Liangzhe ;
Hossain, K. S. M. Tozammel ;
Butler, Patrick ;
Ramakrishnan, Naren ;
Prakash, B. Aditya .
DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (03) :681-710
[5]  
Culotta A, 2010, Arxiv, DOI arXiv:1007.4748
[6]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7]   An interactive web-based dashboard to track COVID-19 in real time [J].
Dong, Ensheng ;
Du, Hongru ;
Gardner, Lauren .
LANCET INFECTIOUS DISEASES, 2020, 20 (05) :533-534
[8]  
Lopez CE, 2020, Arxiv, DOI arXiv:2003.10359
[9]  
Freedman D. A., 2009, Statistical Models: Theory and Practice
[10]   HealthMap: Global infectious disease monitoring through automated classification and visualization of Internet media reports [J].
Freifeld, Clark C. ;
Mandl, Kenneth D. ;
Ras, Ben Y. ;
Bronwnstein, John S. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2008, 15 (02) :150-157