Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts

被引:8
|
作者
Li, Yikuan [1 ,2 ]
Salimi-Khorshidi, Gholamreza [1 ,2 ]
Rao, Shishir [1 ,2 ]
Canoy, Dexter [1 ,2 ,3 ]
Hassaine, Abdelaali [1 ,2 ]
Lukasiewicz, Thomas [4 ]
Rahimi, Kazem [1 ,2 ,3 ]
Mamouei, Mohammad [1 ,2 ]
机构
[1] Univ Oxford, Oxford Martin Sch, Deep Med, Hayes House,75 George St, Oxford OX1 2BQ, England
[2] Univ Oxford, Nuffield Dept Womens & Reprod Hlth, Med Sci Div, Oxford, England
[3] Oxford Univ Hosp NHS Fdn Trust, NIHR Oxford Biomed Res Ctr, Oxford, England
[4] Univ Oxford, Dept Comp Sci, Oxford, England
来源
基金
英国科研创新办公室;
关键词
Cardiovascular disease risk; Heart Failure; Stroke; Coronary heart disease; Predictive modelling; Data shifts; PROFILE;
D O I
10.1093/ehjdh/ztac061
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
AimsDeep learning has dominated predictive modelling across different fields, but in medicine it has been met with mixed reception. In clinical practice, simple, statistical models and risk scores continue to inform cardiovascular disease risk predictions. This is due in part to the knowledge gap about how deep learning models perform in practice when they are subject to dynamic data shifts; a key criterion that common internal validation procedures do not address. We evaluated the performance of a novel deep learning model, BEHRT, under data shifts and compared it with several ML-based and established risk models.Methods and resultsUsing linked electronic health records of 1.1 million patients across England aged at least 35 years between 1985 and 2015, we replicated three established statistical models for predicting 5-year risk of incident heart failure, stroke, and coronary heart disease. The results were compared with a widely accepted machine learning model (random forests), and a novel deep learning model (BEHRT). In addition to internal validation, we investigated how data shifts affect model discrimination and calibration. To this end, we tested the models on cohorts from (i) distinct geographical regions; (ii) different periods. Using internal validation, the deep learning models substantially outperformed the best statistical models by 6%, 8%, and 11% in heart failure, stroke, and coronary heart disease, respectively, in terms of the area under the receiver operating characteristic curve.ConclusionThe performance of all models declined as a result of data shifts; despite this, the deep learning models maintained the best performance in all risk prediction tasks. Updating the model with the latest information can improve discrimination but if the prior distribution changes, the model may remain miscalibrated. Graphical AbstractDesign and main results of the model evaluation in the presence of data shift. EHR, electronic health records; HES, hospital episode statistics; HF, heart failure; CHD, coronary heart disease; CPH, COX proportional hazard; ML, machine learning; DL, deep learning; RF, random forest.
引用
收藏
页码:535 / 547
页数:13
相关论文
共 50 条
  • [21] Postoperative delirium prediction using machine learning models and preoperative electronic health record data
    Bishara, Andrew
    Chiu, Catherine
    Whitlock, Elizabeth L.
    Douglas, Vanja C.
    Lee, Sei
    Butte, Atul J.
    Leung, Jacqueline M.
    Donovan, Anne L.
    BMC ANESTHESIOLOGY, 2022, 22 (01)
  • [22] Using Multi-Modal Electronic Health Record Data for the Development and Validation of Risk Prediction Models for Long COVID Using the Super Learner Algorithm
    Jin, Weijia
    Hao, Wei
    Shi, Xu
    Fritsche, Lars G.
    Salvatore, Maxwell
    Admon, Andrew J.
    Friese, Christopher R.
    Mukherjee, Bhramar
    JOURNAL OF CLINICAL MEDICINE, 2023, 12 (23)
  • [23] Development and internal validation of prediction models for future hospital care utilization by patients with multimorbidity using electronic health record data
    Verhoeff, Marlies
    de Groot, Janke
    Burgers, Jako S.
    van Munster, Barbara C.
    PLOS ONE, 2022, 17 (03):
  • [24] Prediction of incident myocardial infarction using machine learning applied to harmonized electronic health record data
    Divneet Mandair
    Premanand Tiwari
    Steven Simon
    Kathryn L. Colborn
    Michael A. Rosenberg
    BMC Medical Informatics and Decision Making, 20
  • [25] Prediction of incident myocardial infarction using machine learning applied to harmonized electronic health record data
    Mandair, Divneet
    Tiwari, Premanand
    Simon, Steven
    Colborn, Kathryn L.
    Rosenberg, Michael A.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (01)
  • [26] PREDICTION MODELS FOR CARDIOVASCULAR RISK On validation of cardiovascular risk scores
    Woodward, Mark
    BMJ-BRITISH MEDICAL JOURNAL, 2016, 354
  • [27] Cardiovascular disease risk prediction for people with type 2 diabetes in a population-based cohort and in electronic health record data
    Szymonifka, Jackie
    Conderino, Sarah
    Cigolle, Christine
    Ha, Jinkyung
    Kabeto, Mohammed
    Yu, Jaehong
    Dodson, John A.
    Thorpe, Lorna
    Blaum, Caroline
    Zhong, Judy
    JAMIA OPEN, 2020, 3 (04) : 583 - 592
  • [28] Computational prediction models for early detection of risk of cardiovascular events using mass spectrometry data
    Pham, Tuan D.
    Wang, Honghui
    Zhou, Xiaobo
    Beck, Dominik
    Brandl, Miriam
    Hoehn, Gerard
    Azok, Joseph
    Brennan, Marie-Luise
    Hazen, Stanley L.
    Li, King
    Wong, Stephen T. C.
    IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2008, 12 (05): : 636 - 643
  • [29] Building Cardiovascular Risk Prediction Models by Applying Machine Learning Methods to Right-Censored Electronic Health Data
    Kotalik, Ales
    Vock, David
    Wolfson, Julian
    O'Connor, Patrick
    CIRCULATION, 2017, 135
  • [30] EVALUATING RISK-PREDICTION MODELS USING DATA FROM ELECTRONIC HEALTH RECORDS
    Wang, Le
    Shaw, Pamela A.
    Mathelier, Hansie M.
    Kimmel, Stephen E.
    French, Benjamin
    ANNALS OF APPLIED STATISTICS, 2016, 10 (01): : 286 - 304