Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework

被引:15
作者
Xue, Mingyue [1 ,2 ]
Su, Yinxia [2 ]
Li, Chen [3 ]
Wang, Shuxia [4 ]
Yao, Hua [4 ]
机构
[1] Xinjiang Med Univ, Hosp Tradit Chinese Med, Clin Med Coll 4, Urumqi, Peoples R China
[2] Xinjiang Med Univ, Coll Publ Hlth, Urumqi, Peoples R China
[3] Xinjiang Med Univ, Affiliated Hosp 1, Urumqi, Peoples R China
[4] Xinjiang Med Univ, Affiliated Hosp 1, Ctr Hlth Management, Urumqi, Peoples R China
基金
中国国家自然科学基金;
关键词
LIFE-STYLE INTERVENTIONS; RISK STRATIFICATION; LOGISTIC-REGRESSION; ALCOHOL-CONSUMPTION; PREVENTION PROGRAM; FEATURE-SELECTION; DECISION-TREE; FOLLOW-UP; MELLITUS; CLASSIFICATION;
D O I
10.1155/2020/6873891
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background. An estimated 425 million people globally have diabetes, accounting for 12% of the world's health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas. Methods. A total of 584,168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified bypvalues and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables' importance scores of T2DM. Results. The results indicated that XGBoost had the best performance (accuracy = 0.906, precision = 0.910, recall = 0.902, F-1 = 0.906, and AUC = 0.968). The degree of variables' importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving). Conclusions. We proposed a classifier based on LR-XGBoost which used fourteen variables of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of T2DM. The classifier can accurately screen the risk of diabetes in the early phrase, and the degree of variables' importance scores gives a clue to prevent diabetes occurrence.
引用
收藏
页数:12
相关论文
共 69 条
  • [31] The long-term effect of lifestyle interventions to prevent diabetes in the China Da Qing Diabetes Prevention Study: a 20-year follow-up study
    Li, Guangwei
    Zhang, Ping
    Wang, Jinping
    Gregg, Edward W.
    Yang, Wenying
    Gong, Qiuhong
    Li, Hui
    Li, Hongliang
    Jiang, Yayun
    An, Yali
    Shuai, Ying
    Zhang, Bo
    Zhang, Jingling
    Thompson, Theodore J.
    Gerzoff, Robert B.
    Roglic, Gojka
    Hu, Yinghua
    Bennett, Peter H.
    [J]. LANCET, 2008, 371 (9626) : 1783 - 1789
  • [32] Prediction of G Protein-Coupled Receptors with SVM-Prot Features and Random Forest
    Liao, Zhijun
    Ju, Ying
    Zou, Quan
    [J]. SCIENTIFICA, 2016, 2016
  • [33] Automatically explaining machine learning prediction results: A demonstration on type 2 diabetes risk prediction
    Luo G.
    [J]. Health Information Science and Systems, 4 (1)
  • [34] MLBCD: A machine learning tool for big clinical data
    Luo G.
    [J]. Health Information Science and Systems, 3 (1)
  • [35] Classification and prediction of diabetes disease using machine learning paradigm
    Maniruzzaman, Md.
    Rahman, Md. Jahanur
    Ahammed, Benojir
    Abedin, Md. Menhazul
    [J]. HEALTH INFORMATION SCIENCE AND SYSTEMS, 2020, 8 (01)
  • [36] Risk factors of neonatal mortality and child mortality in Bangladesh
    Maniruzzaman, Md
    Suri, Harman S.
    Kumar, Nishith
    Abedin, Md Menhazul
    Rahman, Md Jahanur
    El-Baz, Ayman
    Bhoot, Makrand
    Teji, Jagjit S.
    Suri, Jasjit S.
    [J]. JOURNAL OF GLOBAL HEALTH, 2018, 8 (01)
  • [37] Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers
    Maniruzzaman, Md
    Rahman, Md Jahanur
    Al-MehediHasan, Md
    Suri, Harman S.
    Abedin, Md Menhazul
    El-Baz, Ayman
    Suri, Jasjit S.
    [J]. JOURNAL OF MEDICAL SYSTEMS, 2018, 42 (05)
  • [38] Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm
    Maniruzzaman, Md.
    Kumar, Nishith
    Abedin, Md. Menhazul
    Islam, Md. Shaykhul
    Suri, Harman S.
    El-Baz, Ayman S.
    Suri, Jasjit S.
    [J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2017, 152 : 23 - 34
  • [39] Marinov Miroslav, 2011, J Diabetes Sci Technol, V5, P1549
  • [40] Empathy Study in Rodent Model of Autism Spectrum Disorders
    Mony, Tamanna Jahan
    Hong, Minha
    Lee, Hee Jae
    [J]. PSYCHIATRY INVESTIGATION, 2018, 15 (02) : 104 - 110