Predicting Readmission or Death After Discharge From the ICU: External Validation and Retraining of a Machine Learning Model

被引:25
作者
de Hond, Anne A. H. [1 ,2 ,3 ]
Kant, Ilse M. J. [1 ,3 ]
Fornasa, Mattia [4 ]
Cina, Giovanni [4 ,5 ]
Elbers, Paul W. G. [6 ]
Thoral, Patrick J. J. [6 ]
Arbous, M. Sesmu [7 ]
Steyerberg, Ewout W. W. [3 ]
机构
[1] Leiden Univ Med Ctr, Dept Informat Technol & Digital Innovat 1, Leiden, Netherlands
[2] Stanford Med, Dept Biomed Informat, Stanford, CA 94305 USA
[3] Leiden Univ, Med Ctr, Dept Biomed Data Sci, Leiden, Netherlands
[4] 4 Pacmed, Stadhouderskade 55, Amsterdam, Netherlands
[5] Univ Amsterdam, Inst Logic Language & Computat, Amsterdam, Netherlands
[6] Amsterdam UMC, Dept Intens Care Med, Lab Crit Care Computat Intelligence, Amsterdam, Netherlands
[7] Leiden Univ, Med Ctr, Dept Intens Care Med, Leiden, Netherlands
关键词
clinical decision support; critical care; data science; external validation; generalizability; machine learning; PERFORMANCE; FRAMEWORK;
D O I
10.1097/CCM.0000000000005758
中图分类号
R4 [临床医学];
学科分类号
1002 ; 100602 ;
摘要
OBJECTIVES:Many machine learning (ML) models have been developed for application in the ICU, but few models have been subjected to external validation. The performance of these models in new settings therefore remains unknown. The objective of this study was to assess the performance of an existing decision support tool based on a ML model predicting readmission or death within 7 days after ICU discharge before, during, and after retraining and recalibration. DESIGN:A gradient boosted ML model was developed and validated on electronic health record data from 2004 to 2021. We performed an independent validation of this model on electronic health record data from 2011 to 2019 from a different tertiary care center. SETTING:Two ICUs in tertiary care centers in The Netherlands. PATIENTS:Adult patients who were admitted to the ICU and stayed for longer than 12 hours. INTERVENTIONS:None. MEASUREMENTS AND MAIN RESULTS:We assessed discrimination by area under the receiver operating characteristic curve (AUC) and calibration (slope and intercept). We retrained and recalibrated the original model and assessed performance via a temporal validation design. The final retrained model was cross-validated on all data from the new site. Readmission or death within 7 days after ICU discharge occurred in 577 of 10,052 ICU admissions (5.7%) at the new site. External validation revealed moderate discrimination with an AUC of 0.72 (95% CI 0.67-0.76). Retrained models showed improved discrimination with AUC 0.79 (95% CI 0.75-0.82) for the final validation model. Calibration was poor initially and good after recalibration via isotonic regression. CONCLUSIONS:In this era of expanding availability of ML models, external validation and retraining are key steps to consider before applying ML models to new settings. Clinicians and decision-makers should take this into account when considering applying new ML models to their local settings.
引用
收藏
页码:291 / 300
页数:10
相关论文
共 37 条
[1]   Random forest method for the recognition of susceptibility and resistance patterns in antibiograms [J].
Ayala-Aldana, Nicolas ;
Gonzalez-Valdes, Leticia .
REVISTA CHILENA DE INFECTOLOGIA, 2023, 40 (01) :76-77
[2]   Prediction across healthcare settings: a case study in predicting emergency department disposition [J].
Barak-Corren, Yuval ;
Chaudhari, Pradip ;
Perniciaro, Jessica ;
Waltzman, Mark ;
Fine, Andrew M. ;
Reis, Ben Y. .
NPJ DIGITAL MEDICINE, 2021, 4 (01)
[3]  
Caruana R., 2004, P 10 ACM SIGKDD INT, P69, DOI DOI 10.1145/1014052.1014063
[4]  
Collins GS, 2015, ANN INTERN MED, V162, P55, DOI [10.7326/M14-0697, 10.1111/eci.12376, 10.1186/s12916-014-0241-z, 10.1136/bmj.g7594, 10.1016/j.jclinepi.2014.11.010, 10.7326/M14-0698, 10.1016/j.eururo.2014.11.025, 10.1002/bjs.9736, 10.1038/bjc.2014.639]
[5]   Detection of calibration drift in clinical prediction models to inform model updating [J].
Davis, Sharon E. ;
Greevy, Robert A. ;
Lasko, Thomas A. ;
Walsh, Colin G. ;
Matheny, Michael E. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 112
[6]  
de Hond AAH., 2022, FRONT DIGITAL HLTH, V4, P1
[7]   Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review [J].
de Hond, Anne A. H. ;
Leeuwenberg, Artuur M. ;
Hooft, Lotty ;
Kant, Ilse M. J. ;
Nijman, Steven W. J. ;
van Os, Hendrikus J. A. ;
Aardoom, Jiska J. ;
Debray, Thomas P. A. ;
Schuit, Ewoud ;
van Smeden, Maarten ;
Reitsma, Johannes B. ;
Steyerberg, Ewout W. ;
Chavannes, Niels H. ;
Moons, Karel G. M. .
NPJ DIGITAL MEDICINE, 2022, 5 (01)
[8]   The Potential Cost-Effectiveness of a Machine Learning Tool That Can Prevent Untimely Intensive Care Unit Discharge [J].
de Vos, Juliette ;
Visser, Laurenske A. ;
de Beer, Aletta A. ;
Fornasa, Mattia ;
Thoral, Patrick J. ;
Elbers, Paul W. G. ;
Cina, Giovanni .
VALUE IN HEALTH, 2022, 25 (03) :359-367
[9]  
Faes L., 2022, FRONT DIGITAL HLTH, V4, P1
[10]   Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy [J].
Fleuren, Lucas M. ;
Klausch, Thomas L. T. ;
Zwager, Charlotte L. ;
Schoonmade, Linda J. ;
Guo, Tingjie ;
Roggeveen, Luca F. ;
Swart, Eleonora L. ;
Girbes, Armand R. J. ;
Thoral, Patrick ;
Ercole, Ari ;
Hoogendoorn, Mark ;
Elbers, Paul W. G. .
INTENSIVE CARE MEDICINE, 2020, 46 (03) :383-400