Value of machine learning algorithms for predicting diabetes risk: A subset analysis from a real-world retrospective cohort study

被引:4
作者
Mao, Yaqian [1 ]
Zhu, Zheng [2 ]
Pan, Shuyao [2 ]
Lin, Wei [2 ]
Liang, Jixing [2 ]
Huang, Huibin [2 ]
Li, Liantao [2 ]
Wen, Junping [2 ]
Chen, Gang [2 ,3 ]
机构
[1] Fujian Med Univ, Fujian Prov Hosp, Dept Internal Med, South Branch,Shengli Clin Med Coll, Fuzhou, Peoples R China
[2] Fujian Med Univ, Fujian Prov Hosp, Dept Endocrinol, Shengli Clin Med Coll, Fuzhou, Peoples R China
[3] Fujian Acad Med, Fujian Prov Key Lab Med Anal, Fuzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Diabetes; Machine learning algorithms; Predictive model; ARTIFICIAL NEURAL-NETWORK; CONVENTIONAL REGRESSION; LOGISTIC-REGRESSION; INTELLIGENCE; MODELS;
D O I
10.1111/jdi.13937
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Aims/Introduction To compare the application value of different machine learning (ML) algorithms for diabetes risk prediction. Materials and Methods This is a 3-year retrospective cohort study with a total of 3,687 participants being included in the data analysis. Modeling variable screening and predictive model building were carried out using logistic regression (LR) analysis and 10-fold cross-validation, respectively. In total, six different ML algorithms, including random forests, light gradient boosting machine, extreme gradient boosting, adaptive boosting (AdaBoost), multi-layer perceptrons and gaussian naive bayes were used for model construction. Model performance was mainly evaluated by the area under the receiver operating characteristic curve. The best performing ML model was selected for comparison with the traditional LR model and visualized using Shapley additive explanations. Results A total of eight risk factors most associated with the development of diabetes were identified by univariate and multivariate LR analysis, and they were visualized in the form of a nomogram. Among the six different ML models, the random forests model had the best predictive performance. After 10-fold cross-validation, its optimal model has an area under the receiver operating characteristic value of 0.855 (95% confidence interval [CI] 0.823-0.886) in the training set and 0.835 (95% CI 0.779-0.892) in the test set. In the traditional LR model, its area under the receiver operating characteristic value is 0.840 (95% CI 0.814-0.866) in the training set and 0.834 (95% CI 0.785-0.884) in the test set. Conclusions In the real-world epidemiological research, the combination of traditional variable screening and ML algorithm to construct a diabetes risk prediction model has satisfactory clinical application value.
引用
收藏
页码:309 / 320
页数:12
相关论文
共 46 条
[1]   Artificial Intelligence Applications in Type 2 Diabetes Mellitus Care: Focus on Machine Learning Methods [J].
Abhari, Shahabeddin ;
Kalhori, Sharareh R. Niakan ;
Ebrahimi, Mehdi ;
Hasannejadasl, Hajar ;
Garavand, Ali .
HEALTHCARE INFORMATICS RESEARCH, 2019, 25 (04) :248-261
[2]   Nomograms in oncology: more than meets the eye [J].
Balachandran, Vinod P. ;
Gonen, Mithat ;
Smith, J. Joshua ;
DeMatteo, Ronald P. .
LANCET ONCOLOGY, 2015, 16 (04) :E173-E180
[3]   Cohort profile: Risk evaluation of cancers in Chinese diabetic individuals: A longitudinal (REACTION) study [J].
Bi, Yufang ;
Lu, Jieli ;
Wang, Weiqing ;
Mu, Yiming ;
Zhao, Jiajun ;
Liu, Chao ;
Chen, Lulu ;
Shi, Lixin ;
Li, Qiang ;
Wan, Qin ;
Wu, Shengli ;
Yang, Tao ;
Yan, Li ;
Liu, Yan ;
Wang, Guixia ;
Luo, Zuojie ;
Tang, Xulei ;
Chen, Gang ;
Huo, Yanan ;
Gao, Zhengnan ;
Su, Qing ;
Ye, Zhen ;
Wang, Youming ;
Qin, Guijun ;
Deng, Huacong ;
Yu, Xuefeng ;
Shen, Feixia ;
Chen, Li ;
Zhao, Liebin ;
Zhang, Jie ;
Sun, Jichao ;
Dai, Meng ;
Xu, Min ;
Xu, Yu ;
Chen, Yuhong ;
Lai, Shenghan ;
Bloomgarden, Zachary T. ;
Li, Donghui ;
Ning, Guang .
JOURNAL OF DIABETES, 2014, 6 (02) :147-157
[4]  
Birk N., 2021, J NUTR, V151, P110
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
Byeon H, 2021, IRAN J PUBLIC HEALTH, V50, P315, DOI 10.18502/ijph.v50i2.5346
[7]   Predicting postoperative complications of head and neck squamous cell carcinoma in elderly patients using random forest algorithm model [J].
Chen, YiMing ;
Cao, Wei ;
Gao, XianChao ;
Ong, HuiShan ;
Ji, Tong .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2015, 15
[8]   Screening for Prediabetes Using Machine Learning Models [J].
Choi, Soo Beom ;
Kim, Won Jae ;
Yoo, Tae Keun ;
Park, Jee Soo ;
Chung, Jai Won ;
Lee, Yong-ho ;
Kang, Eun Seok ;
Kim, Deok Won .
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2014, 2014
[9]   A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models [J].
Christodoulou, Evangelia ;
Ma, Jie ;
Collins, Gary S. ;
Steyerberg, Ewout W. ;
Verbakel, Jan Y. ;
Van Calster, Ben .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2019, 110 :12-22
[10]   Multicenter Comparison of Machine Learning Methods and Conventional Regression for Predicting Clinical Deterioration on the Wards [J].
Churpek, Matthew M. ;
Yuen, Trevor C. ;
Winslow, Christopher ;
Meltzer, David O. ;
Kattan, Michael W. ;
Edelson, Dana P. .
CRITICAL CARE MEDICINE, 2016, 44 (02) :368-374