Improving prediction of COVID-19 mortality using machine learning in the Spanish SEMI-COVID-19 registry

被引:0
作者
José-Manuel Casas-Rojo
Paula Sol Ventura
Juan Miguel Antón Santos
Aitor Ortiz de Latierro
José Carlos Arévalo-Lorido
Marc Mauri
Manuel Rubio-Rivas
Rocío González-Vega
Vicente Giner-Galvañ
Bárbara Otero Perpiñá
Eva Fonseca-Aizpuru
Antonio Muiño
Esther Del Corral-Beamonte
Ricardo Gómez-Huelgas
Francisco Arnalich-Fernández
Mónica Llorente Barrio
Aresio Sancha-Lloret
Isabel Rábago Lorite
José Loureiro-Amigo
Santiago Pintos-Martínez
Eva García-Sardón
Adrián Montaño-Martínez
María Gloria Rojano-Rivero
José-Manuel Ramos-Rincón
Alejandro López-Escobar
机构
[1] Infanta Cristina University Hospital,Internal Medicine Department
[2] Hospital HM Nens,Department of Pediatric Endocrinology
[3] HM Hospitales,Internal Medicine Department
[4] Hospital Universitario Infanta Cristina. Parla,Data Scientist
[5] Kaizen AI,Internal Medicine Department
[6] Complejo Hospitalario Universitario,Internal Medicine Department
[7] Bellvitge University Hospital,Internal Medicine Department
[8] Hospital Costa del Sol,Internal Medicine Department
[9] Hospital Universitario San Juan. San Juan de Alicante,Internal Medicine Department
[10] Hospital Universitario 12 de Octubre,Internal Medicine Department
[11] Hospital Universitario de Cabueñes,Internal Medicine Department
[12] Hospital Universitario Gregorio Marañón,Clinical Medicine Department
[13] Hospital Royo Villanova,Internal Medicine Department
[14] Regional University Hospital of Málaga,Internal Medicine Department
[15] Biomedical Research Institute of Málaga (IBIMA),Internal Medicine Department
[16] University of Málaga (UMA),Internal Medicine Department
[17] Hospital Universitario La Paz- Cantoblanco,Internal Medicine Department
[18] Hospital Universitario Miguel Servet,Internal Medicine Department
[19] Hospital Universitario La Princesa,Internal Medicine Department
[20] Hospital Universitario Infanta Sofía. San Sebastián de los Reyes,Internal Medicine Department
[21] Hospital Moisès Broggi,Clinical Medicine Department
[22] Sant Joan Despí,Pediatrics Department, Clinical Research Unit
[23] Hospital Universitario de Sagunto,undefined
[24] Hospital Universitario de Cáceres,undefined
[25] Internal Medicine Department,undefined
[26] Hospital de Montilla,undefined
[27] Internal Medicine Department,undefined
[28] Hospital Infanta Elena,undefined
[29] Miguel Hernandez University of Elche,undefined
[30] Hospital Universitario Vithas Madrid La Milagrosa,undefined
[31] Fundación Vithas,undefined
来源
Internal and Emergency Medicine | 2023年 / 18卷
关键词
COVID-19; Machine learning; Deep learning; Mortality; Spain;
D O I
暂无
中图分类号
学科分类号
摘要
COVID-19 is responsible for high mortality, but robust machine learning-based predictors of mortality are lacking. To generate a model for predicting mortality in patients hospitalized with COVID-19 using Gradient Boosting Decision Trees (GBDT). The Spanish SEMI-COVID-19 registry includes 24,514 pseudo-anonymized cases of patients hospitalized with COVID-19 from 1 February 2020 to 5 December 2021. This registry was used as a GBDT machine learning model, employing the CatBoost and BorutaShap classifier to select the most relevant indicators and generate a mortality prediction model by risk level, ranging from 0 to 1. The model was validated by separating patients according to admission date, using the period 1 February to 31 December 2020 (first and second waves, pre-vaccination period) for training, and 1 January to 30 November 2021 (vaccination period) for the test group. An ensemble of ten models with different random seeds was constructed, separating 80% of the patients for training and 20% from the end of the training period for cross-validation. The area under the receiver operating characteristics curve (AUC) was used as a performance metric. Clinical and laboratory data from 23,983 patients were analyzed. CatBoost mortality prediction models achieved an AUC performance of 84.76 (standard deviation 0.45) for patients in the test group (potentially vaccinated patients not included in model training) using 16 features. The performance of the 16-parameter GBDT model for predicting COVID-19 hospital mortality, although requiring a relatively large number of predictors, shows a high predictive capacity.
引用
收藏
页码:1711 / 1722
页数:11
相关论文
共 131 条
[1]  
Zhang R(2020)Identifying airborne transmission as the dominant route for the spread of COVID-19 Proc Natl Acad Sci U S A 117 14857-14863
[2]  
Li Y(2020)Cardiovascular considerations for patients, health care workers, and health systems during the COVID-19 pandemic J Am Coll Cardiol 75 2352-2371
[3]  
Zhang AL(2020)Clinical characteristics of patients hospitalized with COVID-19 in Spain: results from the SEMI-COVID-19 registry Rev Clin Esp 220 480-494
[4]  
Driggin E(2020)Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area JAMA 323 2052-2059
[5]  
Madhavan MV(2020)The outbreak of COVID-19: an overview J Chin Med Assoc 83 217-220
[6]  
Bikdeli B(2021)Patient trajectories among persons hospitalized for covid-19: a cohort study Ann Intern Med 174 33-41
[7]  
Casas-Rojo JM(2020)A tool for early prediction of severe coronavirus disease 2019 (COVID-19): a multicenter study using the risk nomogram in Wuhan and Guangdong, China Clin Infect Dis 71 833-840
[8]  
Antón-Santos JM(2020)Early detection and assessment of covid-19 Front Med 7 311-929
[9]  
Millán-Núñez-Cortés J(2020)Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO clinical characterisation protocol: development and validation of the 4c mortality score BMJ 370 m3339-524
[10]  
Richardson S(2021)Development and validation of a prediction model for 30-day mortality in hospitalised patients with COVID-19: the COVID-19 SEIMC score Thorax 76 920-1345