Forecasting the dynamics of cumulative COVID-19 cases (confirmed, recovered and deaths) for top-16 countries using statistical machine learning models: Auto-Regressive Integrated Moving Average (ARIMA) and Seasonal Auto-Regressive Integrated Moving Average (SARIMA)

被引:119
作者
ArunKumar, K. E. [1 ]
Kalaga, Dinesh V. [2 ]
Kumar, Ch. Mohan Sai [3 ]
Chilkoor, Govinda [4 ]
Kawaji, Masahiro [2 ]
Brenza, Timothy M. [1 ,5 ]
机构
[1] South Dakota Sch Mines & Technol, Dept Chem & Biol Engn, Rapid City, SD 57701 USA
[2] CUNY, Mech Engn Dept, New York, NY 10031 USA
[3] CSIR Cent Inst Med & Aromat Plants, Proc Chem & Technol, Lucknow 226015, Uttar Pradesh, India
[4] South Dakota Sch Mines & Technol, Dept Civil & Environm Engn, Rapid City, SD 57701 USA
[5] South Dakota Sch Mines & Technol, Biomed Engn Program, Rapid City, SD 57701 USA
关键词
COVID-19; Statistical modeling; Pandemic; ARIMA SARIMA; Time-series forecast; Auto-Correlation Function (ACF); Akaike Information Criteria (AIC); Bayesian Information Criterion (BIC); SPREAD;
D O I
10.1016/j.asoc.2021.107161
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most countries are reopening or considering lifting the stringent prevention policies such as lockdowns, consequently, daily coronavirus disease (COVID-19) cases (confirmed, recovered and deaths) are increasing significantly. As of July 25th, there are 16.5 million global cumulative confirmed cases, 9.4 million cumulative recovered cases and 0.65 million deaths. There is a tremendous necessity of supervising and estimating future COVID-19 cases to control the spread and help countries prepare their healthcare systems. In this study, time-series models - Auto-Regressive Integrated Moving Average (ARIMA) and Seasonal Auto-Regressive Integrated Moving Average (SARIMA) are used to forecast the epidemiological trends of the COVID-19 pandemic for top-16 countries where 70%-80% of global cumulative cases are located. Initial combinations of the model parameters were selected using the auto-ARIMA model followed by finding the optimized model parameters based on the best fit between the predictions and test data. Analytical tools Auto-Correlation function (ACF), Partial Auto-Correlation Function (PACF), Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were used to assess the reliability of the models. Evaluation metrics Mean Absolute Error (MAE), Mean Square Error (MSE), Root Mean Square Error (RMSE) and Mean Absolute Percent Error (MAPE) were used as criteria for selecting the best model. A case study was presented where the statistical methodology was discussed in detail for model selection and the procedure for forecasting the COVID-19 cases of the USA. Best model parameters of ARIMA and SARIMA for each country are selected manually and the optimized parameters are then used to forecast the COVID-19 cases. Forecasted trends for confirmed and recovered cases showed an exponential rise for countries such as the United States, Brazil, South Africa, Colombia, Bangladesh, India, Mexico and Pakistan. Similarly, trends for cumulative deaths showed an exponential rise for countries Brazil, South Africa, Chile, Colombia, Bangladesh, India, Mexico, Iran, Peru, and Russia. SARIMA model predictions are more realistic than that of the ARIMA model predictions confirming the existence of seasonality in COVID-19 data. The results of this study not only shed light on the future trends of the COVID-19 outbreak in top-16 countries but also guide these countries to prepare their health care policies for the ongoing pandemic. The data used in this work is obtained from publicly available John Hopkins University's COVID-19 database. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:26
相关论文
共 40 条
  • [1] Andersen M., 2020, EARLY EVIDENCE SOCIA, DOI [10.2139/ssrn.3569368, DOI 10.2139/SSRN.3569368]
  • [2] [Anonymous], 2020, LANCET, V396, P649, DOI 10.1016/S0140-6736(20)31856-0
  • [3] [Anonymous], 2016, Time series analysis: forecasting and control
  • [4] [Anonymous], 2000, Introduction to time series analysis and forecasting (with applications of SAS and SPPS)
  • [5] COVID-19 Outbreak Prediction with Machine Learning
    Ardabili, Sina F.
    Mosavi, Amir
    Ghamisi, Pedram
    Ferdinand, Filip
    Varkonyi-Koczy, Annamaria R.
    Reuter, Uwe
    Rabczuk, Timon
    Atkinson, Peter M.
    [J]. ALGORITHMS, 2020, 13 (10)
  • [6] ArunKumar K.E., 2021, CHAOS SOLITON FRACT
  • [7] Azarafza M., 2020, MEDRXIV
  • [8] A brief review of socio-economic and environmental impact of Covid-19
    Bashir, Muhammad Farhan
    Ma, Benjiang
    Shahzad, Luqman
    [J]. AIR QUALITY ATMOSPHERE AND HEALTH, 2020, 13 (12) : 1403 - 1409
  • [9] CDC, 2020, CONS EV GATH
  • [10] CDC, 2020, COVIDVIEW WEEKL SUMM