Comparisons of the prediction models for undiagnosed diabetes between machine learning versus traditional statistical methods

被引:20
作者
Choi, Seong Gyu [1 ]
Oh, Minsuk [1 ,2 ]
Park, Dong-Hyuk [1 ]
Lee, Byeongchan [3 ]
Lee, Yong-ho [4 ]
Jee, Sun Ha [5 ]
Jeon, Justin Y. [1 ,2 ,6 ,7 ]
机构
[1] Yonsei Univ, Dept Sports Ind Studies, Seoul, South Korea
[2] Yonsei Univ, Frontier Res Inst Convergence Sports Sci, Seoul, South Korea
[3] Gauss Labs, Seoul, South Korea
[4] Yonsei Univ, Dept Internal Med, Coll Med, Seoul, South Korea
[5] Yonsei Univ, Inst Hlth Promot, Grad Sch Publ Hlth, Seoul, South Korea
[6] ICONS, Exercise Med Ctr Diabet & Canc Patients, Seoul, South Korea
[7] Yonsei Univ, Canc Prevent Ctr Shinchon Severance, Coll Med, Seoul 120749, South Korea
关键词
NUTRITION EXAMINATION SURVEY; IMPAIRED GLUCOSE-TOLERANCE; CROSS-VALIDATION; NATIONAL-HEALTH; RISK SCORE; CLASSIFICATION;
D O I
10.1038/s41598-023-40170-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We compared the prediction performance of machine learning-based undiagnosed diabetes prediction models with that of traditional statistics-based prediction models. We used the 2014-2020 Korean National Health and Nutrition Examination Survey (KNHANES) (N=32,827). The KNHANES 2014-2018 data were used as training and internal validation sets and the 2019-2020 data as external validation sets. The receiver operating characteristic curve area under the curve (AUC) was used to compare the prediction performance of the machine learning-based and the traditional statistics-based prediction models. Using sex, age, resting heart rate, and waist circumference as features, the machine learning-based model showed a higher AUC (0.788 vs. 0.740) than that of the traditional statistical-based prediction model. Using sex, age, waist circumference, family history of diabetes, hypertension, alcohol consumption, and smoking status as features, the machine learning-based prediction model showed a higher AUC (0.802 vs. 0.759) than the traditional statistical-based prediction model. The machine learning-based prediction model using features for maximum prediction performance showed a higher AUC (0.819 vs. 0.765) than the traditional statistical-based prediction model. Machine learning-based prediction models using anthropometric and lifestyle measurements may outperform the traditional statistics-based prediction models in predicting undiagnosed diabetes.
引用
收藏
页数:11
相关论文
共 48 条
[1]   Optuna: A Next-generation Hyperparameter Optimization Framework [J].
Akiba, Takuya ;
Sano, Shotaro ;
Yanase, Toshihiko ;
Ohta, Takeru ;
Koyama, Masanori .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :2623-2631
[2]   Permutation importance: a corrected feature importance measure [J].
Altmann, Andre ;
Tolosi, Laura ;
Sander, Oliver ;
Lengauer, Thomas .
BIOINFORMATICS, 2010, 26 (10) :1340-1347
[3]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139
[4]   Consciousness is not a property of states: A reply to Wilberg [J].
Berger, Jacob .
PHILOSOPHICAL PSYCHOLOGY, 2014, 27 (06) :829-842
[5]  
Borch-Johnsen K, 1998, BMJ-BRIT MED J, V317, P371
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[8]   Cross-validation methods [J].
Browne, MW .
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2000, 44 (01) :108-132
[9]  
Buhlmann P., 2012, Handbook of Computational Statistics: Concepts and Methods, P985
[10]   A survey on feature selection methods [J].
Chandrashekar, Girish ;
Sahin, Ferat .
COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28