Multivariable prediction model of complications derived from diabetes mellitus using machine learning on scarce highly unbalanced data

被引:0
作者
Colmenares-Mejia, Claudia C. [1 ]
Rincon-Acuna, Juan C. [2 ,3 ]
Cely, Andres [1 ,4 ]
Gonzalez-Velez, Abel E. [5 ]
Castillo, Andrea [6 ]
Murcia, Jossie [7 ]
Isaza-Ruget, Mario A. [8 ]
机构
[1] Fdn Univ Sanitas, Bogota, DC, Colombia
[2] Univ Santander, Campus Lagos del Cacique, Bucaramanga, Santander, Colombia
[3] Keralty, Corp Data Management, Bogota, DC, Colombia
[4] Univ Nacl Colombia, Bogota, DC, Colombia
[5] Univ Hosp Torrejon, Prevent Med Serv, Torrejon De Ardoz, Spain
[6] EPS Sanitas, Direcc Gest Conocimiento, Bogota, DC, Colombia
[7] Fdn Univ Sanitas, Inst Gerencia & Gest Sanitaria, Bogota, DC, Colombia
[8] Fdn Univ Sanitas, Res Grp INPAC, Bogota, DC, Colombia
关键词
Complications; Diabetes mellitus; Machine learning; Predictive analytics; Risk predictions;
D O I
10.1007/s13410-023-01264-7
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
BackgroundDiabetes mellitus (DM) increases the risk complications in addition to mortality. Quantifying the risk of complications using artificial intelligence could be a way to design comprehensive patient healthcare programs.ObjectivePredicting the probability of macro and microvascular complications in patients with DM through Machine Learning.MethodsRetrospective cohort study. Based on an outpatient follow-up program for diabetic patients, 64,081 records and 287 variables were identified, with highly unbalanced data. Predictive models for chronic kidney disease (CKD), lower extremity amputation (LEA), coronary heart disease (CHD), and early mortality (MOR) were developed. An exhaustive computational method was conducted to find the best combination between machine learning (ML) algorithms and sampling method.ResultsThe best model was determined by assessing its performance through the heuristics obtained from a comprehensive analysis of the accuracy and F1 values for ML, sampling, and dataset. Regarding each complication, 99.9% accuracy was obtained for LEA, 94.3% for CHD, 97.4% for MOR, and 98.8% for CKD. F1 was assessed to identify false positives, with 84.5% for CKD, 63.6% for MOR, 46.2% for LEA, and 44.8% for CHD.ConclusionsThis ML model can be applied to predict CHD, CKD, and MOR. The success of ML predictions lies in the clinical definition of initial variables and their simplification for obtaining variables based on which the algorithms can identify patients that are likely to develop a complication. For clinical application of this system, it is necessary to assess the cross performance of metrics, as found here (accuracy higher 95% and F1-Score higher than 80%).
引用
收藏
页码:528 / 538
页数:11
相关论文
共 50 条
  • [41] Watershed scale soil moisture estimation model using machine learning and remote sensing in a data-scarce context
    Bueno, Marcelo
    Garcia, Carlos Baca
    Montoya, Nilton
    Rau, Pedro
    Loayza, Hildo
    SCIENTIA AGROPECUARIA, 2024, 15 (01) : 103 - 120
  • [42] Prediction of tumor purity from gene expression data using machine learning
    Koo, Bonil
    Rhee, Je-Keun
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [43] Using Big Data-machine learning models for diabetes prediction and flight delays analytics
    Thérence Nibareke
    Jalal Laassiri
    Journal of Big Data, 7
  • [44] Disease prediction model for secure patient data over cloud using machine learning
    Goyal, Dinesh
    Goyal, Ruchi
    Bhargava, Sandeep
    Sharma, Priyanka
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2022, 25 (04) : 1183 - 1193
  • [45] A machine learning model for the early prediction of ovarian cancer using real world data
    de la Oliva Roque, Victor Manuel
    Esteban-Medina, Alberto
    Alejos Collado, Laura
    Louceras Munecas, Carlos
    Munoyerro-Muniz, Dolores
    Villegas, Roman
    Dopazo Blazquez, Joaquin
    FEBS OPEN BIO, 2024, 14 : 14 - 14
  • [46] A data-driven energy performance gap prediction model using machine learning
    Yilmaz, Derya
    Tanyer, Ali Murat
    Toker, Irem Dikmen
    RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2023, 181
  • [47] Improved Machine Learning Model for Urban Tunnel Settlement Prediction Using Sparse Data
    Yu, Gang
    Jin, Yucong
    Hu, Min
    Li, Zhisheng
    Cai, Rongbin
    Zeng, Ruochen
    Sugumaran, Vijiayan
    SUSTAINABILITY, 2024, 16 (11)
  • [48] Prediction of neonatal subgaleal hemorrhage using first stage of labor data: A machine-learning based model
    Guedalia, Joshua
    Lipschuetz, Michal
    Daoud-Sabag, Lina
    Cohen, Sarah M.
    NovoselskyPersky, Michal
    Yagel, Simcha
    Unger, Ron
    Karavani, Gilad
    JOURNAL OF GYNECOLOGY OBSTETRICS AND HUMAN REPRODUCTION, 2022, 51 (03)
  • [49] Automated Machine Learning (AutoML)-Derived Preconception Predictive Risk Model to Guide Early Intervention for Gestational Diabetes Mellitus
    Kumar, Mukkesh
    Ang, Li Ting
    Png, Hang
    Ng, Maisie
    Tan, Karen
    Loy, See Ling
    Tan, Kok Hian
    Chan, Jerry Kok Yen
    Godfrey, Keith M.
    Chan, Shiao-yng
    Chong, Yap Seng
    Eriksson, Johan G.
    Feng, Mengling
    Karnani, Neerja
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2022, 19 (11)
  • [50] A Prediction Model of Autism Spectrum Diagnosis from Well-Baby Electronic Data Using Machine Learning
    Ben-Sasson, Ayelet
    Guedalia, Joshua
    Nativ, Liat
    Ilan, Keren
    Shaham, Meirav
    Gabis, Lidia V.
    CHILDREN-BASEL, 2024, 11 (04):