Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods

被引:13
|
作者
Feng, Cindy [1 ]
Kephart, George [1 ]
Juarez-Colunga, Elizabeth [2 ]
机构
[1] Dalhousie Univ, Dept Community Hlth & Epidemiol, Fac Med, 5790 Univ Ave, Halifax, NS B3H 1V7, Canada
[2] Univ Colorado, Dept Biostat & Informat, Anschutz Med Campus, Aurora, CO 80045 USA
基金
加拿大自然科学与工程研究理事会;
关键词
COVID-19; mortality; Predictive model; Generalized additive model; Classification trees; Extreme gradient boosting; LOGISTIC-REGRESSION; MODELS;
D O I
10.1186/s12874-021-01441-4
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background Coronavirus disease (COVID-19) presents an unprecedented threat to global health worldwide. Accurately predicting the mortality risk among the infected individuals is crucial for prioritizing medical care and mitigating the healthcare system's burden. The present study aimed to assess the predictive accuracy of machine learning methods to predict the COVID-19 mortality risk. Methods We compared the performance of classification tree, random forest (RF), extreme gradient boosting (XGBoost), logistic regression, generalized additive model (GAM) and linear discriminant analysis (LDA) to predict the mortality risk among 49,216 COVID-19 positive cases in Toronto, Canada, reported from March 1 to December 10, 2020. We used repeated split-sample validation and k-steps-ahead forecasting validation. Predictive models were estimated using training samples, and predictive accuracy of the methods for the testing samples was assessed using the area under the receiver operating characteristic curve, Brier's score, calibration intercept and calibration slope. Results We found XGBoost is highly discriminative, with an AUC of 0.9669 and has superior performance over conventional tree-based methods, i.e., classification tree or RF methods for predicting COVID-19 mortality risk. Regression-based methods (logistic, GAM and LASSO) had comparable performance to the XGBoost with slightly lower AUCs and higher Brier's scores. Conclusions XGBoost offers superior performance over conventional tree-based methods and minor improvement over regression-based methods for predicting COVID-19 mortality risk in the study population.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods
    Cindy Feng
    George Kephart
    Elizabeth Juarez-Colunga
    BMC Medical Research Methodology, 21
  • [2] TREE-BASED MACHINE LEARNING METHODS FOR MODELING AND FORECASTING MORTALITY
    Bjerre, Dorethe Skovgaard
    ASTIN BULLETIN-THE JOURNAL OF THE INTERNATIONAL ACTUARIAL ASSOCIATION, 2022, 52 (03) : 765 - 787
  • [3] A machine learning based exploration of COVID-19 mortality risk
    Mahdavi, Mahdi
    Choubdar, Hadi
    Zabeh, Erfan
    Rieder, Michael
    Safavi-Naeini, Safieddin
    Jobbagy, Zsolt
    Ghorbani, Amirata
    Abedini, Atefeh
    Kiani, Arda
    Khanlarzadeh, Vida
    Lashgari, Reza
    Kamrani, Ehsan
    PLOS ONE, 2021, 16 (07):
  • [4] Comparison of machine learning and the regression-based EHMRG model for predicting early mortality in acute heart failure
    Austin, David E.
    Lee, Douglas S.
    Wang, Chloe X.
    Ma, Shihao
    Wang, Xuesong
    Porter, Joan
    Wang, Bo
    INTERNATIONAL JOURNAL OF CARDIOLOGY, 2022, 365 : 78 - 84
  • [5] Comparison of regression tree-based methods in genomic selection
    Ashoori-Banaei, Sahar
    Ghafouri-Kesbi, Farhad
    Ahmadi, Ahmad
    JOURNAL OF GENETICS, 2021, 100 (02)
  • [6] Comparison of regression tree-based methods in genomic selection
    Sahar Ashoori-Banaei
    Farhad Ghafouri-Kesbi
    Ahmad Ahmadi
    Journal of Genetics, 2021, 100
  • [7] On Predicting COVID-19 Fatality Ratio Based on Regression Using Machine Learning Model
    Bhuiyan, Mafijul Islam
    Ahmed, Mondar Maruf Moin
    Alvi, Anik
    Islam, Safiqul
    Mondal, Prasenjit
    Hossain, Akbar
    Hoque, S. N. M. Azizul
    ADVANCED INFORMATION NETWORKING AND APPLICATIONS, AINA-2022, VOL 2, 2022, 450 : 329 - 338
  • [8] Tree-based Machine Learning Methods for Survey Research
    Kern, Christoph
    Klausch, Thomas
    Kreuter, Frauke
    SURVEY RESEARCH METHODS, 2019, 13 (01): : 73 - 93
  • [9] Regression-based classification methods and their comparison with decision tree algorithms
    Kiselev, MV
    Ananyan, SM
    Arseniev, SB
    PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1997, 1263 : 134 - 144
  • [10] Development and validation of a symbolic regression-based machine learning method to predict COVID-19 in-hospital mortality among vaccinated patients
    Sofos, Filippos
    Rouka, Erasmia
    Triantafyllia, Vasiliki
    Andreakos, Evangelos
    Gourgoulianis, Konstantinos I.
    Karakasidis, Efstathios
    Karakasidis, Theodoros
    HEALTH AND TECHNOLOGY, 2024, 14 (06) : 1217 - 1228