Comparisons of the prediction models for undiagnosed diabetes between machine learning versus traditional statistical methods

被引：20

作者：

Choi, Seong Gyu ^{[1
]}

Oh, Minsuk ^{[1
,2
]}

Park, Dong-Hyuk ^{[1
]}

Lee, Byeongchan ^{[3
]}

Lee, Yong-ho ^{[4
]}

Jee, Sun Ha ^{[5
]}

Jeon, Justin Y. ^{[1
,2
,6
,7
]}

机构：

[1] Yonsei Univ, Dept Sports Ind Studies, Seoul, South Korea

[2] Yonsei Univ, Frontier Res Inst Convergence Sports Sci, Seoul, South Korea

[3] Gauss Labs, Seoul, South Korea

[4] Yonsei Univ, Dept Internal Med, Coll Med, Seoul, South Korea

[5] Yonsei Univ, Inst Hlth Promot, Grad Sch Publ Hlth, Seoul, South Korea

[6] ICONS, Exercise Med Ctr Diabet & Canc Patients, Seoul, South Korea

[7] Yonsei Univ, Canc Prevent Ctr Shinchon Severance, Coll Med, Seoul 120749, South Korea

来源：

SCIENTIFIC REPORTS | 2023年 / 13卷 / 01期

关键词：

NUTRITION EXAMINATION SURVEY; IMPAIRED GLUCOSE-TOLERANCE; CROSS-VALIDATION; NATIONAL-HEALTH; RISK SCORE; CLASSIFICATION;

D O I：

10.1038/s41598-023-40170-0

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

We compared the prediction performance of machine learning-based undiagnosed diabetes prediction models with that of traditional statistics-based prediction models. We used the 2014-2020 Korean National Health and Nutrition Examination Survey (KNHANES) (N=32,827). The KNHANES 2014-2018 data were used as training and internal validation sets and the 2019-2020 data as external validation sets. The receiver operating characteristic curve area under the curve (AUC) was used to compare the prediction performance of the machine learning-based and the traditional statistics-based prediction models. Using sex, age, resting heart rate, and waist circumference as features, the machine learning-based model showed a higher AUC (0.788 vs. 0.740) than that of the traditional statistical-based prediction model. Using sex, age, waist circumference, family history of diabetes, hypertension, alcohol consumption, and smoking status as features, the machine learning-based prediction model showed a higher AUC (0.802 vs. 0.759) than the traditional statistical-based prediction model. The machine learning-based prediction model using features for maximum prediction performance showed a higher AUC (0.819 vs. 0.765) than the traditional statistical-based prediction model. Machine learning-based prediction models using anthropometric and lifestyle measurements may outperform the traditional statistics-based prediction models in predicting undiagnosed diabetes.

引用

页数：11

共 48 条

[1] Optuna: A Next-generation Hyperparameter Optimization Framework [J].

Akiba, Takuya ;

Sano, Shotaro ;

Yanase, Toshihiko ;

Ohta, Takeru ;

Koyama, Masanori .

KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :2623-2631

[2] Permutation importance: a corrected feature importance measure [J].

Altmann, Andre ;

Tolosi, Laura ;

Sander, Oliver ;

Lengauer, Thomas .

BIOINFORMATICS, 2010, 26 (10) :1340-1347

[3] An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].

Bauer, E ;

Kohavi, R .

MACHINE LEARNING, 1999, 36 (1-2) :105-139

[4] Consciousness is not a property of states: A reply to Wilberg [J].

Berger, Jacob .

PHILOSOPHICAL PSYCHOLOGY, 2014, 27 (06) :829-842

[5]

Borch-Johnsen K, 1998, BMJ-BRIT MED J, V317, P371

[6] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[7] Bagging predictors [J].

Breiman, L .

MACHINE LEARNING, 1996, 24 (02) :123-140

[8] Cross-validation methods [J].

Browne, MW .

JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2000, 44 (01) :108-132

[9]

Buhlmann P., 2012, Handbook of Computational Statistics: Concepts and Methods, P985

[10] A survey on feature selection methods [J].

Chandrashekar, Girish ;

Sahin, Ferat .

COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28

← 1 2 3 4 5 →