Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study

被引:62
作者
Fang, Zheng-gang [1 ]
Yang, Shu-qin [1 ]
Lv, Cai-xia [1 ]
An, Shu-yi [2 ]
Wu, Wei [1 ]
机构
[1] China Med Univ, Dept Epidemiol, Shenyang, Peoples R China
[2] Liaoning Prov Ctr Dis Control & Prevent, Dept Social Med & Hlth, Shenyang, Peoples R China
基金
中国国家自然科学基金;
关键词
COVID-19; epidemiology;
D O I
10.1136/bmjopen-2021-056685
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objective The COVID-19 outbreak was first reported in Wuhan, China, and has been acknowledged as a pandemic due to its rapid spread worldwide. Predicting the trend of COVID-19 is of great significance for its prevention. A comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more accurate for anticipating the occurrence of COVID-19 in the USA. Design Time-series study. Setting The USA was the setting for this study. Main outcome measures Three accuracy metrics, mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE), were applied to evaluate the performance of the two models. Results In our study, for the training set and the validation set, the MAE, RMSE and MAPE of the XGBoost model were less than those of the ARIMA model. Conclusions The XGBoost model can help improve prediction of COVID-19 cases in the USA over the ARIMA model.
引用
收藏
页数:8
相关论文
共 50 条
[21]   A framework for data-driven solutions with covid-19 illustrations [J].
Mwitondi K.S. ;
Said R.A. .
Data Science Journal, 2021, 20 (01)
[22]   Forecasting COVID-19 pandemic: A data-driven analysis [J].
Nabi, Khondoker Nazmoon .
CHAOS SOLITONS & FRACTALS, 2020, 139
[23]   Smart cities and a data-driven response to COVID-19 [J].
James, Philip ;
Das, Ronnie ;
Jalosinska, Agata ;
Smith, Luke .
DIALOGUES IN HUMAN GEOGRAPHY, 2020, 10 (02) :255-259
[24]   Generated time-series prediction data of COVID-19's daily infections in Brazil by using recurrent neural networks [J].
Hawas, Mohamed .
DATA IN BRIEF, 2020, 32
[25]   Impact of COVID-19 on access to cancer care in Rwanda: a retrospective time-series study using electronic medical records data [J].
Habinshuti, Placide ;
Nshimyiryo, Alphonse ;
Fejfar, Donald Luke ;
Niyigena, Anne ;
Cubaka, Vincent K. ;
Karema, Nadine ;
Bigirimana, Jean Bosco ;
Shyirambere, Cyprien ;
Barnhart, Dale A. ;
Kateera, Fredrick ;
Fulcher, Isabel .
BMJ OPEN, 2022, 12 (12)
[26]   An XGBoost Model for Age Prediction from COVID-19 Blood Test [J].
Qomariyah, Nunung Nurul ;
Purwita, Ardimas Andi ;
Astriani, Maria Seraphina ;
Asri, Sri Dhuny Atas ;
Kazakov, Dimitar .
2021 4TH INTERNATIONAL SEMINAR ON RESEARCH OF INFORMATION TECHNOLOGY AND INTELLIGENT SYSTEMS (ISRITI 2021), 2020,
[27]   Application of machine learning time series analysis for prediction COVID-19 pandemic [J].
Chaurasia V. ;
Pal S. .
Research on Biomedical Engineering, 2022, 38 (01) :35-47
[28]   A Data-Driven Digital Application to Enhance the Capacity Planning of the COVID-19 Vaccination Process [J].
Markhorst, Berend ;
Zver, Tara ;
Malbasic, Nina ;
Dijkstra, Renze ;
Otto, Daan ;
van der Mei, Rob ;
Moeke, Dennis .
VACCINES, 2021, 9 (10)
[29]   A data-driven interpretable ensemble framework based on tree models for forecasting the occurrence of COVID-19 in the USA [J].
Zheng, Hu-Li ;
An, Shu-Yi ;
Qiao, Bao-Jun ;
Guan, Peng ;
Huang, De-Sheng ;
Wu, Wei .
ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2023, 30 (05) :13648-13659
[30]   A data-driven interpretable ensemble framework based on tree models for forecasting the occurrence of COVID-19 in the USA [J].
Hu-Li Zheng ;
Shu-Yi An ;
Bao-Jun Qiao ;
Peng Guan ;
De-Sheng Huang ;
Wei Wu .
Environmental Science and Pollution Research, 2023, 30 (5) :13648-13659