Value of machine learning algorithms for predicting diabetes risk: A subset analysis from a real-world retrospective cohort study

被引:4
作者
Mao, Yaqian [1 ]
Zhu, Zheng [2 ]
Pan, Shuyao [2 ]
Lin, Wei [2 ]
Liang, Jixing [2 ]
Huang, Huibin [2 ]
Li, Liantao [2 ]
Wen, Junping [2 ]
Chen, Gang [2 ,3 ]
机构
[1] Fujian Med Univ, Fujian Prov Hosp, Dept Internal Med, South Branch,Shengli Clin Med Coll, Fuzhou, Peoples R China
[2] Fujian Med Univ, Fujian Prov Hosp, Dept Endocrinol, Shengli Clin Med Coll, Fuzhou, Peoples R China
[3] Fujian Acad Med, Fujian Prov Key Lab Med Anal, Fuzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Diabetes; Machine learning algorithms; Predictive model; ARTIFICIAL NEURAL-NETWORK; CONVENTIONAL REGRESSION; LOGISTIC-REGRESSION; INTELLIGENCE; MODELS;
D O I
10.1111/jdi.13937
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Aims/Introduction To compare the application value of different machine learning (ML) algorithms for diabetes risk prediction. Materials and Methods This is a 3-year retrospective cohort study with a total of 3,687 participants being included in the data analysis. Modeling variable screening and predictive model building were carried out using logistic regression (LR) analysis and 10-fold cross-validation, respectively. In total, six different ML algorithms, including random forests, light gradient boosting machine, extreme gradient boosting, adaptive boosting (AdaBoost), multi-layer perceptrons and gaussian naive bayes were used for model construction. Model performance was mainly evaluated by the area under the receiver operating characteristic curve. The best performing ML model was selected for comparison with the traditional LR model and visualized using Shapley additive explanations. Results A total of eight risk factors most associated with the development of diabetes were identified by univariate and multivariate LR analysis, and they were visualized in the form of a nomogram. Among the six different ML models, the random forests model had the best predictive performance. After 10-fold cross-validation, its optimal model has an area under the receiver operating characteristic value of 0.855 (95% confidence interval [CI] 0.823-0.886) in the training set and 0.835 (95% CI 0.779-0.892) in the test set. In the traditional LR model, its area under the receiver operating characteristic value is 0.840 (95% CI 0.814-0.866) in the training set and 0.834 (95% CI 0.785-0.884) in the test set. Conclusions In the real-world epidemiological research, the combination of traditional variable screening and ML algorithm to construct a diabetes risk prediction model has satisfactory clinical application value.
引用
收藏
页码:309 / 320
页数:12
相关论文
共 46 条
[21]   Predictive model and risk analysis for diabetic retinopathy using machine learning: a retrospective cohort study in China [J].
Li, Wanyue ;
Song, Yanan ;
Chen, Kang ;
Ying, Jun ;
Zheng, Zhong ;
Qiao, Shen ;
Yang, Ming ;
Zhang, Maonian ;
Zhang, Ying .
BMJ OPEN, 2021, 11 (11)
[22]   Prediction error estimation: a comparison of resampling methods [J].
Molinaro, AM ;
Simon, R ;
Pfeiffer, RM .
BIOINFORMATICS, 2005, 21 (15) :3301-3307
[23]   Logistic regression was as good as machine learning for predicting major chronic diseases [J].
Nusinovici, Simon ;
Tham, Yih Chung ;
Yan, Marco Yu Chak ;
Ting, Daniel Shu Wei ;
Li, Jialiang ;
Sabanayagam, Charumathi ;
Wong, Tien Yin ;
Cheng, Ching-Yu .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2020, 122 :56-69
[24]   An artificial neural network-pharmacokinetic model and its interpretation using Shapley additive explanations [J].
Ogami, Chika ;
Tsuji, Yasuhiro ;
Seki, Hiroto ;
Kawano, Hideaki ;
To, Hideto ;
Matsumoto, Yoshiaki ;
Hosono, Hiroyuki .
CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY, 2021, 10 (07) :760-768
[25]   IDF diabetes Atlas: Global estimates of undiagnosed diabetes in adults [J].
Ogurtsova, Katherine ;
Guariguata, Leonor ;
Barengo, Noel C. W. ;
Ruiz, Paz Lopez-Doriga ;
Sacre, Julian ;
Karuranga, Suvi J. ;
Sun, Hong J. ;
Boyko, Edward ;
Magliano, Dianna .
DIABETES RESEARCH AND CLINICAL PRACTICE, 2022, 183
[26]   Combinatorial biomarker expression in breast cancer [J].
Rakha, Emad A. ;
Reis-Filho, Jorge S. ;
Ellis, Ian O. .
BREAST CANCER RESEARCH AND TREATMENT, 2010, 120 (02) :293-308
[27]   Development of a web-based liver cancer prediction model for type II diabetes patients by using an artificial neural network [J].
Rau, Hsiao-Hsien ;
Hsu, Chien-Yeh ;
Lin, Yu-An ;
Atique, Suleman ;
Fuad, Anis ;
Wei, Li-Ming ;
Hsu, Ming-Huei .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2016, 125 :58-65
[28]   Lloyd Shapley (1923-2016) A founding father of game theory [J].
Roth, Alvin E. .
NATURE, 2016, 532 (7598) :178-178
[29]   Machine Learning and Neurosurgical Outcome Prediction: A Systematic Review [J].
Senders, Joeky T. ;
Staples, Patrick C. ;
Karhade, Aditya V. ;
Zaki, Mark M. ;
Gormley, William B. ;
Broekman, Marike L. D. ;
Smith, Timothy R. ;
Arnaout, Omar .
WORLD NEUROSURGERY, 2018, 109 :476-+
[30]   Machine Learning Algorithms Outperform Conventional Regression Models in Predicting Development of Hepatocellular Carcinoma [J].
Singal, Amit G. ;
Mukherjee, Ashin ;
Elmunzer, B. Joseph ;
Higgins, Peter D. R. ;
Lok, Anna S. ;
Zhu, Ji ;
Marrero, Jorge A. ;
Waljee, Akbar K. .
AMERICAN JOURNAL OF GASTROENTEROLOGY, 2013, 108 (11) :1723-1730