Prediction of Diabetes Using Data Mining and Machine Learning Algorithms: A Cross-Sectional Study

被引:3
作者
Shojaee-Mend, Hassan [1 ]
Velayati, Farnia [2 ]
Tayefi, Batool [3 ]
Babaee, Ebrahim [3 ,4 ,5 ]
机构
[1] Gonabad Univ Med Sci, Infect Dis Res Ctr, Gonabad, Iran
[2] Shahid Beheshti Univ Med Sci, Natl Res Inst TB & Lung Dis NRITLD, Telemed Res Ctr, Tehran, Iran
[3] Iran Univ Med Sci, Psychosocial Hlth Res Inst, Prevent Med & Publ Hlth Res Ctr, Sch Med,Dept Community & Family Med, Tehran, Iran
[4] Iran Univ Med Sci, Vaccine Res Ctr, Tehran, Iran
[5] Iran Univ Med Sci, Psychosocial Hlth Res Inst, Prevent Publ Hlth Res Ctr, POB 14665-354, Tehran 1449614535, Iran
关键词
Diabetes Mellitus; Machine Learning; Data Mining; Decision Trees; Risk Factors;
D O I
10.4258/hir.2024.30.1.73
中图分类号
R-058 [];
学科分类号
摘要
Objectives: This study aimed to develop a model to predict fasting blood glucose status using machine learning and data mining, since the early diagnosis and treatment of diabetes can improve outcomes and quality of life. Methods: This crosssectional study analyzed data from 3376 adults over 30 years old at 16 comprehensive health service centers in Tehran, Iran who participated in a diabetes screening program. The dataset was balanced using random sampling and the synthetic minority over-sampling technique (SMOTE). The dataset was split into training set (80%) and test set (20%). Shapley values were calculated to select the most important features. Noise analysis was performed by adding Gaussian noise to the numerical features to evaluate the robustness of feature importance. Five different machine learning algorithms, including CatBoost, random forest, XGBoost, logistic regression, and an artificial neural network, were used to model the dataset. Accuracy, sensitivity, specificity, accuracy, the F1-score, and the area under the curve were used to evaluate the model. Results: Age, waist-to-hip ratio, body mass index, and systolic blood pressure were the most important factors for predicting fasting blood glucose status. Though the models achieved similar predictive ability, the CatBoost model performed slightly better overall with 0.737 area under the curve (AUC). Conclusions: A gradient boosted decision tree model accurately identified the most important risk factors related to diabetes. Age, waist-to-hip ratio, body mass index, and systolic blood pressure were the most important risk factors for diabetes, respectively. This model can support planning for diabetes management and prevention.
引用
收藏
页码:73 / 82
页数:10
相关论文
共 50 条
  • [41] Early Prediction of Diabetes Using Feature Selection and Machine Learning Algorithms
    Abdollahi J.
    Aref S.
    SN Computer Science, 5 (2)
  • [42] A Method For Fetal Assessment Using Data Mining and Machine Learning
    Copeland, Wes
    Chiang, Chia-Chu
    THIRD INTERNATIONAL CONFERENCE ON INFORMATION SECURITY AND INTELLIGENT CONTROL (ISIC 2012), 2012, : 341 - 344
  • [43] A Comparative Study with Different Machine Learning Algorithms for Diabetes Disease Prediction
    Kibria, Hafsa Binte
    Matin, Abdul
    Jahan, Nusrat
    Islam, Sanzida
    2021 18TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, COMPUTING SCIENCE AND AUTOMATIC CONTROL (CCE 2021), 2021,
  • [44] Using machine learning approach to predict depression and anxiety among patients with epilepsy in China: A cross-sectional study
    Wei, Zihan
    Wang, Xinpei
    Ren, Lei
    Liu, Chang
    Liu, Chao
    Cao, Mi
    Feng, Yan
    Gan, Yanjing
    Li, Guoyan
    Liu, Xufeng
    Liu, Yonghong
    Yang, Lei
    Deng, Yanchun
    JOURNAL OF AFFECTIVE DISORDERS, 2023, 336 : 1 - 8
  • [45] Machine Learning Prediction of Tongue Pressure in Elderly Patients with Head and Neck Tumor: A Cross-Sectional Study
    Han, Xuewei
    Bai, Ziyi
    Mogushi, Kaoru
    Hase, Takeshi
    Takeuchi, Katsuyuki
    Iida, Yoritsugu
    Sumita, Yuka I.
    Wakabayashi, Noriyuki
    JOURNAL OF CLINICAL MEDICINE, 2024, 13 (08)
  • [46] Prediction of an educational institute learning environment using machine learning and data mining
    Shoaib, Muhammad
    Sayed, Nasir
    Amara, Nedra
    Latif, Abdul
    Azam, Sikandar
    Muhammad, Sajjad
    EDUCATION AND INFORMATION TECHNOLOGIES, 2022, 27 (07) : 9099 - 9123
  • [47] Prediction of an educational institute learning environment using machine learning and data mining
    Muhammad Shoaib
    Nasir Sayed
    Nedra Amara
    Abdul Latif
    Sikandar Azam
    Sajjad Muhammad
    Education and Information Technologies, 2022, 27 : 9099 - 9123
  • [48] Diabetes and Cataracts Development-Characteristics, Subtypes and Predictive Modeling Using Machine Learning in Romanian Patients: A Cross-Sectional Study
    Ivanescu, Adriana
    Popescu, Simona
    Braha, Adina
    Timar, Bogdan
    Sorescu, Teodora
    Lazar, Sandra
    Timar, Romulus
    Gaita, Laura
    MEDICINA-LITHUANIA, 2025, 61 (01):
  • [49] Diabetic peripheral neuropathy detection of type 2 diabetes using machine learning from TCM features: a cross-sectional study
    Tian, Zhikui
    Zhang, Jizhong
    Fan, Yadong
    Sun, Xuan
    Wang, Dongjun
    Liu, Xiaofei
    Lu, Guohui
    Wang, Hongwu
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2025, 25 (01)
  • [50] Multiple disease prediction using Machine learning algorithms
    Arumugam K.
    Naved M.
    Shinde P.P.
    Leiva-Chauca O.
    Huaman-Osorio A.
    Gonzales-Yanac T.
    Materials Today: Proceedings, 2023, 80 : 3682 - 3685