Predicting disease onset from electronic health records for population health management: a scalable and explainable Deep Learning approach

被引:2
作者
Grout, Robert [1 ]
Gupta, Rishab [2 ]
Bryant, Ruby [3 ]
Elmahgoub, Mawada A. [3 ]
Li, Yijie [3 ]
Irfanullah, Khushbakht [3 ]
Patel, Rahul F. [3 ]
Fawkes, Jake [4 ]
Inness, Catherine [3 ]
机构
[1] Accenture, Leeds, England
[2] Accenture, San Francisco, CA USA
[3] Accenture, London, England
[4] Univ Oxford, Dept Stat, Oxford, England
来源
FRONTIERS IN ARTIFICIAL INTELLIGENCE | 2024年 / 6卷
关键词
Population Health Management; Electronic Health Records; Deep Learning; chronic disease; Natural Language Processing; disease code embedding; ARTIFICIAL-INTELLIGENCE; CARE; TERMINOLOGY;
D O I
10.3389/frai.2023.1287541
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Introduction The move from a reactive model of care which treats conditions when they arise to a proactive model which intervenes early to prevent adverse healthcare events will benefit from advances in the predictive capabilities of Artificial Intelligence and Machine Learning. This paper investigates the ability of a Deep Learning (DL) approach to predict future disease diagnosis from Electronic Health Records (EHR) for the purposes of Population Health Management. Methods In this study, embeddings were created using a Word2Vec algorithm from structured vocabulary commonly used in EHRs e.g., Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) codes. This study is based on longitudinal medical data from similar to 50 m patients in the USA. We introduced a novel method of including binned observation values into an embeddings model. We also included novel features associated with wider determinants of health. Patient records comprising these embeddings were then fed to a Bidirectional Gated Recurrent Unit (GRU) model to predict the likelihood of patients developing Type 2 Diabetes Mellitus, Chronic Obstructive Pulmonary Disorder (COPD), Hypertension or experiencing an Acute Myocardial Infarction (MI) in the next 3 years. SHapley Additive exPlanations (SHAP) values were calculated to achieve model explainability. Results Increasing the data scope to include binned observations and wider determinants of health was found to improve predictive performance. We achieved an area under the Receiver Operating Characteristic curve value of 0.92 for Diabetes prediction, 0.94 for COPD, 0.92 for Hypertension and 0.94 for MI. The SHAP values showed that the models had learned features known to be associated with these outcomes. Discussion The DL approach outlined in this study can identify clinically-relevant features from large-scale EHR data and use these to predict future disease outcomes. This study highlights the promise of DL solutions for identifying patients at future risk of disease and providing clinicians with the means to understand and evaluate the drivers of those predictions.
引用
收藏
页数:19
相关论文
共 54 条
  • [1] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
  • [2] Beam A. L., 2019, Pacific Symposium on Biocomputing 2020
  • [3] The Triple Aim: Care, health, and cost
    Berwick, Donald M.
    Nolan, Thomas W.
    Whittington, John
    [J]. HEALTH AFFAIRS, 2008, 27 (03) : 759 - 769
  • [4] Lack of private health insurance is associated with higher mortality from cancer and other chronic diseases, poor diet quality, and inflammatory biomarkers in the United States
    Bittoni, Marisa A.
    Wexler, Randy
    Spees, Colleen K.
    Clinton, Steven K.
    Taylor, Christopher A.
    [J]. PREVENTIVE MEDICINE, 2015, 81 : 420 - 426
  • [5] Buck D., 2018, A vision for population health: towards a healthier future
  • [6] Cai X., 2018, arXiv, DOI [10.24963/ijcai.2018/554, DOI 10.24963/IJCAI.2018/554]
  • [7] Interpretable machine learning for imbalanced credit scoring datasets
    Chen, Yujia
    Calabrese, Raffaella
    Martin-Barragan, Belen
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2024, 312 (01) : 357 - 372
  • [8] Choi E, 2017, Arxiv, DOI [arXiv:1602.03686, DOI 10.48550/ARXIV.1602.03686]
  • [9] Multi-layer Representation Learning for Medical Concepts
    Choi, Edward
    Bahadori, Mohammad Taha
    Searles, Elizabeth
    Coffey, Catherine
    Thompson, Michael
    Bost, James
    Tejedor-Sojo, Javier
    Sun, Jimeng
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1495 - 1504
  • [10] Choi Y., 2016, AMIA Summits on Translational Science Proceedings