Predicting adverse outcomes due to diabetes complications with machine learning using administrative health data

被引:67
作者
Ravaut, Mathieu [1 ,2 ]
Sadeghi, Hamed [1 ]
Leung, Kin Kwan [1 ]
Volkovs, Maksims [1 ]
Kornas, Kathy [3 ]
Harish, Vinyas [3 ,4 ]
Watson, Tristan [3 ,5 ]
Lewis, Gary F. [6 ,7 ]
Weisman, Alanna [8 ,9 ]
Poutanen, Tomi [1 ]
Rosella, Laura [3 ,5 ,10 ,11 ,12 ]
机构
[1] Layer 6 AI, Toronto, ON, Canada
[2] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[3] Univ Toronto, Dalla Lana Sch Publ Hlth, Toronto, ON, Canada
[4] Univ Toronto, Temerty Fac Med, MD PhD Program, Toronto, ON, Canada
[5] ICES, Toronto, ON, Canada
[6] Univ Toronto, Temerty Fac Med, Dept Med, Toronto, ON, Canada
[7] Univ Toronto, Temerty Fac Med, Dept Physiol, Toronto, ON, Canada
[8] Mt Sinai Hosp, Lunenfeld Tanenbaum Res Inst, Toronto, ON, Canada
[9] Univ Toronto, Temerty Fac Med, Div Endocrinol & Metab, Toronto, ON, Canada
[10] Vector Inst, Toronto, ON, Canada
[11] Trillium Hlth Partners, Inst Better Hlth, Mississauga, ON, Canada
[12] Univ Toronto, Temerty Fac Med, Dept Lab Med & Pathol, Toronto, ON, Canada
基金
加拿大健康研究院;
关键词
CARDIOVASCULAR-DISEASE; MAJOR COMPLICATIONS; RISK; COSTS; POPULATION; CARE; EPIDEMIOLOGY; EXPLANATIONS; PREVALENCE; PREVENTION;
D O I
10.1038/s41746-021-00394-8
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Across jurisdictions, government and health insurance providers hold a large amount of data from patient interactions with the healthcare system. We aimed to develop a machine learning-based model for predicting adverse outcomes due to diabetes complications using administrative health data from the single-payer health system in Ontario, Canada. A Gradient Boosting Decision Tree model was trained on data from 1,029,366 patients, validated on 272,864 patients, and tested on 265,406 patients. Discrimination was assessed using the AUC statistic and calibration was assessed visually using calibration plots overall and across population subgroups. Our model predicting three-year risk of adverse outcomes due to diabetes complications (hyper/hypoglycemia, tissue infection, retinopathy, cardiovascular events, amputation) included 700 features from multiple diverse data sources and had strong discrimination (average test AUC = 77.7, range 77.7-77.9). Through the design and validation of a high-performance model to predict diabetes complications adverse outcomes at the population level, we demonstrate the potential of machine learning and administrative health data to inform health planning and healthcare resource allocation for diabetes management.
引用
收藏
页数:12
相关论文
共 88 条
[1]   Biases in electronic health record data due to processes within the healthcare system: retrospective observational study [J].
Agniel, Denis ;
Kohane, Isaac S. ;
Weber, Griffin M. .
BMJ-BRITISH MEDICAL JOURNAL, 2018, 361
[2]   Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants [J].
Alaa, Ahmed M. ;
Bolton, Thomas ;
Di Angelantonio, Emanuele ;
Rudd, James H. F. ;
van der Schaar, Mihaela .
PLOS ONE, 2019, 14 (05)
[3]   A Cascade of Care for Diabetes in the United States: Visualizing the Gaps [J].
Ali, Mohammed K. ;
Bullard, Kai McKeever ;
Gregg, Edward W. ;
del Rio, Carlos .
ANNALS OF INTERNAL MEDICINE, 2014, 161 (10) :681-689
[4]   Predicting 10-Year Risk of End-Organ Complications of Type 2 Diabetes With and Without Metabolic Surgery: A Machine Learning Approach [J].
Aminian, Ali ;
Zajichek, Alexander ;
Arterburn, David E. ;
Wolski, Kathy E. ;
Brethauer, Stacy A. ;
Schauer, Philip R. ;
Nissen, Steven E. ;
Kattan, Michael W. .
DIABETES CARE, 2020, 43 (04) :852-859
[5]  
[Anonymous], 2004, MACHINE LEARNING ECM
[6]   The Brier score does not evaluate the clinical utility of diagnostic tests or prediction models [J].
Melissa Assel ;
Daniel D. Sjoberg ;
Andrew J. Vickers .
Diagnostic and Prognostic Research, 1 (1)
[7]  
Bojer C, 2020, LEARNINGS KAGGLES FO
[8]   Early specialist care for diabetes: who benefits most? A propensity score-matched cohort study [J].
Booth, G. L. ;
Shah, B. R. ;
Austin, P. C. ;
Hux, J. E. ;
Luo, J. ;
Lok, C. E. .
DIABETIC MEDICINE, 2016, 33 (01) :111-118
[9]   DIABETES Progress in reducing vascular complications of diabetes [J].
Booth, Gillian L. ;
Zinman, Bernard .
NATURE REVIEWS ENDOCRINOLOGY, 2014, 10 (08) :451-453
[10]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159