Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records

被引:69
|
作者
Nguyen, Binh P. [1 ]
Pham, Hung N. [2 ]
Tran, Hop [1 ]
Nghiem, Nhung [3 ]
Nguyen, Quang H. [2 ]
Do, Trang T. T. [4 ]
Cao Truong Tran [5 ]
Simpson, Colin R. [6 ,7 ]
机构
[1] Victoria Univ Wellington, Sch Math & Stat, Wellington 6140, New Zealand
[2] Hanoi Univ Sci & Technol, Sch Informat & Commun Technol, 1 Dai Co Viet Rd, Hanoi 100000, Vietnam
[3] Univ Otago, Dept Publ Hlth, 23A Mein St, Wellington 6021, New Zealand
[4] Agcy Sci Technol & Res, Inst Infocomm Res, 1 Fusionopolis Way, Singapore 138632, Singapore
[5] Le Quy Don Tech Univ, Fac Informat Technol, 236 Hoang Quoc Viet St, Hanoi 100000, Vietnam
[6] Victoria Univ Wellington, Fac Hlth, Wellington 6140, New Zealand
[7] Univ Edinburgh, Usher Inst, Edinburgh EH8 9AG, Midlothian, Scotland
关键词
Electronic health records; Incidence; Onset; Prediction; Type 2 diabetes mellitus; Wide and deep learning; POPULATION; MODELS;
D O I
10.1016/j.cmpb.2019.105055
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: Diabetes is responsible for considerable morbidity, healthcare utilisation and mortality in both developed and developing countries. Currently, methods of treating diabetes are inadequate and costly so prevention becomes an important step in reducing the burden of diabetes and its complications. Electronic health records (EHRs) for each individual or a population have become important tools in understanding developing trends of diseases. Using EHRs to predict the onset of diabetes could improve the quality and efficiency of medical care. In this paper, we apply a wide and deep learning model that combines the strength of a generalised linear model with various features and a deep feed-forward neural network to improve the prediction of the onset of type 2 diabetes mellitus (T2DM). Materials and methods: The proposed method was implemented by training various models into a logistic loss function using a stochastic gradient descent. We applied this model using public hospital record data provided by the Practice Fusion EHRs for the United States population. The dataset consists of de-identified electronic health records for 9948 patients, of which 1904 have been diagnosed with T2DM. Prediction of diabetes in 2012 was based on data obtained from previous years (2009-2011). The imbalance class of the model was handled by Synthetic Minority Oversampling Technique (SMOTE) for each cross-validation training fold to analyse the performance when synthetic examples for the minority class are created. We used SMOTE of 150 and 30 0 percent, in which 300 percent means that three new synthetic instances are created for each minority class instance. This results in the approximated diabetes:non-diabetes distributions in the training set of 1:2 and 1:1, respectively. Results: Our final ensemble model not using SMOTE obtained an accuracy of 84.28%, area under the receiver operating characteristic curve (AUC) of 84.13%, sensitivity of 31.17% and specificity of 96.85%. Using SMOTE of 150 and 300 percent did not improve AUC (83.33% and 82.12%, respectively) but increased sensitivity (49.40% and 71.57%, respectively) with a moderate decrease in specificity (90.16% and 76.59%, respectively). Discussion and conclusions: Our algorithm has further optimised the prediction of diabetes onset using a novel state-of-the-art machine learning algorithm: the wide and deep learning neural network architecture. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Predicting hypertension onset from longitudinal electronic health records with deep learning
    Datta, Suparno
    Morassi Sasso, Ariane
    Kiwit, Nina
    Bose, Subhronil
    Nadkarni, Girish
    Miotto, Riccardo
    Boettinger, Erwin P.
    JAMIA OPEN, 2022, 5 (04)
  • [2] Deep learning based prediction of depression and anxiety in patients with type 2 diabetes mellitus using regional electronic health records
    Feng, Wei
    Wu, Honghan
    Ma, Hui
    Yin, Yuechuchu
    Tao, Zhenhuan
    Lu, Shan
    Zhang, Xin
    Yu, Yun
    Wan, Cheng
    Liu, Yun
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2025, 196
  • [3] Predicting disease onset from electronic health records for population health management: a scalable and explainable Deep Learning approach
    Grout, Robert
    Gupta, Rishab
    Bryant, Ruby
    Elmahgoub, Mawada A.
    Li, Yijie
    Irfanullah, Khushbakht
    Patel, Rahul F.
    Fawkes, Jake
    Inness, Catherine
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 6
  • [4] Learning latent heterogeneity for type 2 diabetes patients using longitudinal health markers in electronic health records
    Lou, Jitong
    Wang, Yuanjia
    Li, Lang
    Zeng, Donglin
    STATISTICS IN MEDICINE, 2021, 40 (08) : 1930 - 1946
  • [5] Personalized Multimorbidity Management for Patients with Type 2 Diabetes Using Reinforcement Learning of Electronic Health Records
    Zheng, Hua
    Ryzhov, Ilya O.
    Xie, Wei
    Zhong, Judy
    DRUGS, 2021, 81 (04) : 471 - 482
  • [6] Personalized Multimorbidity Management for Patients with Type 2 Diabetes Using Reinforcement Learning of Electronic Health Records
    Hua Zheng
    Ilya O. Ryzhov
    Wei Xie
    Judy Zhong
    Drugs, 2021, 81 : 471 - 482
  • [7] A Bayesian network model for predicting type 2 diabetes risk based on electronic health records
    Xie, Jiang
    Liu, Yan
    Zeng, Xu
    Zhang, Wu
    Mei, Zhen
    MODERN PHYSICS LETTERS B, 2017, 31 (19-21):
  • [8] Identifying Reasons for Statin Nonuse in Patients With Diabetes Using Deep Learning of Electronic Health Records
    Sarraju, Ashish
    Zammit, Alban
    Ngo, Summer
    Witting, Celeste
    Hernandez-Boussard, Tina
    Rodriguez, Fatima
    JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2023, 12 (07):
  • [9] Readmission prediction using deep learning on electronic health records
    Ashfaq, Awais
    Sant'Anna, Anita
    Lingman, Markus
    Nowaczyk, Slawomir
    JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 97
  • [10] Deep Learning Approaches for Predicting Glaucoma Progression Using Electronic Health Records and Natural Language Processing
    Wang, Sophia Y.
    Tseng, Benjamin
    Hernandez-Boussard, Tina
    OPHTHALMOLOGY SCIENCE, 2022, 2 (02):