Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study

被引:50
|
作者
Fang, Zheng-gang [1 ]
Yang, Shu-qin [1 ]
Lv, Cai-xia [1 ]
An, Shu-yi [2 ]
Wu, Wei [1 ]
机构
[1] China Med Univ, Dept Epidemiol, Shenyang, Peoples R China
[2] Liaoning Prov Ctr Dis Control & Prevent, Dept Social Med & Hlth, Shenyang, Peoples R China
来源
BMJ OPEN | 2022年 / 12卷 / 07期
基金
中国国家自然科学基金;
关键词
COVID-19; epidemiology;
D O I
10.1136/bmjopen-2021-056685
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objective The COVID-19 outbreak was first reported in Wuhan, China, and has been acknowledged as a pandemic due to its rapid spread worldwide. Predicting the trend of COVID-19 is of great significance for its prevention. A comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more accurate for anticipating the occurrence of COVID-19 in the USA. Design Time-series study. Setting The USA was the setting for this study. Main outcome measures Three accuracy metrics, mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE), were applied to evaluate the performance of the two models. Results In our study, for the training set and the validation set, the MAE, RMSE and MAPE of the XGBoost model were less than those of the ARIMA model. Conclusions The XGBoost model can help improve prediction of COVID-19 cases in the USA over the ARIMA model.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Time-Series Forecasting and Analysis of COVID-19 Outbreak in Highly Populated Countries: A Data-Driven Approach
    Arunkumar, P. M.
    Ramasamy, Lakshmana Kumar
    Jayanthi, Amala M.
    INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS, 2022, 13 (02)
  • [2] Intelligent computing on time-series data analysis and prediction of COVID-19 pandemics
    Dash, Sujata
    Chakraborty, Chinmay
    Giri, Sourav K.
    Pani, Subhendu Kumar
    PATTERN RECOGNITION LETTERS, 2021, 151 (151) : 69 - 75
  • [3] Anomaly Detection in COVID-19 Time-Series Data
    Homayouni H.
    Ray I.
    Ghosh S.
    Gondalia S.
    Kahn M.G.
    SN Computer Science, 2021, 2 (4)
  • [4] Surrogate Parameters Optimization for Data and Model Fusion of COVID-19 Time-series Data
    Timilehin, Ogundare
    van Zyl, Terence L.
    2021 IEEE 24TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2021, : 821 - 827
  • [5] Data-Driven Prediction for COVID-19 Severity in Hospitalized Patients
    Alrajhi, Abdulrahman A.
    Alswailem, Osama A.
    Wali, Ghassan
    Alnafee, Khalid
    AlGhamdi, Sarah
    Alarifi, Jhan
    AlMuhaideb, Sarab
    ElMoaqet, Hisham
    AbuSalah, Ahmad
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2022, 19 (05)
  • [6] THE DATA-DRIVEN FUZZY COGNITIVE MAP MODEL AND ITS APPLICATION TO PREDICTION OF TIME SERIES
    Shan, Dan
    Lu, Wei
    Yang, Jianhua
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2018, 14 (05): : 1583 - 1602
  • [7] Modeling and forecasting the COVID-19 pandemic time-series data
    Doornik, Jurgen A.
    Castle, Jennifer L.
    Hendry, David F.
    SOCIAL SCIENCE QUARTERLY, 2021, 102 (05) : 2070 - 2087
  • [8] A Time-Series Data-Driven Method for Milling Force Prediction of Robotic Machining
    Wu, Kai
    Lu, Yuan
    Huang, Ruyi
    Kuhlenkotter, Bernd
    Li, Weihua
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 12
  • [9] Data-driven identification of temporal glucose patterns in a large cohort of nondiabetic patients with COVID-19 using time-series clustering
    Mistry, Sejal
    Gouripeddi, Ramkiran
    Facelli, Julio C.
    JAMIA OPEN, 2021, 4 (03)
  • [10] Application of a data-driven DTSF and benchmark models for the prediction of electricity prices in Brazil: A time-series case
    Gontijo, Tiago Silveira
    de Santis, Rodrigo Barbosa
    Costa, Marcelo Azevedo
    JOURNAL OF RENEWABLE AND SUSTAINABLE ENERGY, 2023, 15 (03)