Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study

被引:50
作者
Fang, Zheng-gang [1 ]
Yang, Shu-qin [1 ]
Lv, Cai-xia [1 ]
An, Shu-yi [2 ]
Wu, Wei [1 ]
机构
[1] China Med Univ, Dept Epidemiol, Shenyang, Peoples R China
[2] Liaoning Prov Ctr Dis Control & Prevent, Dept Social Med & Hlth, Shenyang, Peoples R China
来源
BMJ OPEN | 2022年 / 12卷 / 07期
基金
中国国家自然科学基金;
关键词
COVID-19; epidemiology;
D O I
10.1136/bmjopen-2021-056685
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objective The COVID-19 outbreak was first reported in Wuhan, China, and has been acknowledged as a pandemic due to its rapid spread worldwide. Predicting the trend of COVID-19 is of great significance for its prevention. A comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more accurate for anticipating the occurrence of COVID-19 in the USA. Design Time-series study. Setting The USA was the setting for this study. Main outcome measures Three accuracy metrics, mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE), were applied to evaluate the performance of the two models. Results In our study, for the training set and the validation set, the MAE, RMSE and MAPE of the XGBoost model were less than those of the ARIMA model. Conclusions The XGBoost model can help improve prediction of COVID-19 cases in the USA over the ARIMA model.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] A framework for data-driven solutions with covid-19 illustrations
    Mwitondi K.S.
    Said R.A.
    Data Science Journal, 2021, 20 (01)
  • [22] Forecasting COVID-19 pandemic: A data-driven analysis
    Nabi, Khondoker Nazmoon
    CHAOS SOLITONS & FRACTALS, 2020, 139
  • [23] Smart cities and a data-driven response to COVID-19
    James, Philip
    Das, Ronnie
    Jalosinska, Agata
    Smith, Luke
    DIALOGUES IN HUMAN GEOGRAPHY, 2020, 10 (02) : 255 - 259
  • [24] Generated time-series prediction data of COVID-19's daily infections in Brazil by using recurrent neural networks
    Hawas, Mohamed
    DATA IN BRIEF, 2020, 32
  • [25] An XGBoost Model for Age Prediction from COVID-19 Blood Test
    Qomariyah, Nunung Nurul
    Purwita, Ardimas Andi
    Astriani, Maria Seraphina
    Asri, Sri Dhuny Atas
    Kazakov, Dimitar
    2021 4TH INTERNATIONAL SEMINAR ON RESEARCH OF INFORMATION TECHNOLOGY AND INTELLIGENT SYSTEMS (ISRITI 2021), 2020,
  • [26] Impact of COVID-19 on access to cancer care in Rwanda: a retrospective time-series study using electronic medical records data
    Habinshuti, Placide
    Nshimyiryo, Alphonse
    Fejfar, Donald Luke
    Niyigena, Anne
    Cubaka, Vincent K.
    Karema, Nadine
    Bigirimana, Jean Bosco
    Shyirambere, Cyprien
    Barnhart, Dale A.
    Kateera, Fredrick
    Fulcher, Isabel
    BMJ OPEN, 2022, 12 (12):
  • [27] Application of machine learning time series analysis for prediction COVID-19 pandemic
    Chaurasia V.
    Pal S.
    Research on Biomedical Engineering, 2022, 38 (01) : 35 - 47
  • [28] A Data-Driven Digital Application to Enhance the Capacity Planning of the COVID-19 Vaccination Process
    Markhorst, Berend
    Zver, Tara
    Malbasic, Nina
    Dijkstra, Renze
    Otto, Daan
    van der Mei, Rob
    Moeke, Dennis
    VACCINES, 2021, 9 (10)
  • [29] A data-driven interpretable ensemble framework based on tree models for forecasting the occurrence of COVID-19 in the USA
    Zheng, Hu-Li
    An, Shu-Yi
    Qiao, Bao-Jun
    Guan, Peng
    Huang, De-Sheng
    Wu, Wei
    ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2023, 30 (05) : 13648 - 13659
  • [30] A data-driven interpretable ensemble framework based on tree models for forecasting the occurrence of COVID-19 in the USA
    Hu-Li Zheng
    Shu-Yi An
    Bao-Jun Qiao
    Peng Guan
    De-Sheng Huang
    Wei Wu
    Environmental Science and Pollution Research, 2023, 30 (5) : 13648 - 13659