Prediction of Diabetes Using Data Mining and Machine Learning Algorithms: A Cross-Sectional Study

被引:3
作者
Shojaee-Mend, Hassan [1 ]
Velayati, Farnia [2 ]
Tayefi, Batool [3 ]
Babaee, Ebrahim [3 ,4 ,5 ]
机构
[1] Gonabad Univ Med Sci, Infect Dis Res Ctr, Gonabad, Iran
[2] Shahid Beheshti Univ Med Sci, Natl Res Inst TB & Lung Dis NRITLD, Telemed Res Ctr, Tehran, Iran
[3] Iran Univ Med Sci, Psychosocial Hlth Res Inst, Prevent Med & Publ Hlth Res Ctr, Sch Med,Dept Community & Family Med, Tehran, Iran
[4] Iran Univ Med Sci, Vaccine Res Ctr, Tehran, Iran
[5] Iran Univ Med Sci, Psychosocial Hlth Res Inst, Prevent Publ Hlth Res Ctr, POB 14665-354, Tehran 1449614535, Iran
关键词
Diabetes Mellitus; Machine Learning; Data Mining; Decision Trees; Risk Factors;
D O I
10.4258/hir.2024.30.1.73
中图分类号
R-058 [];
学科分类号
摘要
Objectives: This study aimed to develop a model to predict fasting blood glucose status using machine learning and data mining, since the early diagnosis and treatment of diabetes can improve outcomes and quality of life. Methods: This crosssectional study analyzed data from 3376 adults over 30 years old at 16 comprehensive health service centers in Tehran, Iran who participated in a diabetes screening program. The dataset was balanced using random sampling and the synthetic minority over-sampling technique (SMOTE). The dataset was split into training set (80%) and test set (20%). Shapley values were calculated to select the most important features. Noise analysis was performed by adding Gaussian noise to the numerical features to evaluate the robustness of feature importance. Five different machine learning algorithms, including CatBoost, random forest, XGBoost, logistic regression, and an artificial neural network, were used to model the dataset. Accuracy, sensitivity, specificity, accuracy, the F1-score, and the area under the curve were used to evaluate the model. Results: Age, waist-to-hip ratio, body mass index, and systolic blood pressure were the most important factors for predicting fasting blood glucose status. Though the models achieved similar predictive ability, the CatBoost model performed slightly better overall with 0.737 area under the curve (AUC). Conclusions: A gradient boosted decision tree model accurately identified the most important risk factors related to diabetes. Age, waist-to-hip ratio, body mass index, and systolic blood pressure were the most important risk factors for diabetes, respectively. This model can support planning for diabetes management and prevention.
引用
收藏
页码:73 / 82
页数:10
相关论文
共 50 条
  • [1] A review on prediction of diabetes using machine learning and data mining classification techniques
    Pati, Abhilash
    Parhi, Manoranjan
    Pattanayak, Binod Kumar
    INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY, 2023, 41 (01) : 83 - 109
  • [2] Diabetes Prediction using Machine Learning Algorithms
    Mujumdar, Aishwarya
    Vaidehi, V.
    2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 292 - 299
  • [3] Prediction of Suicidal Ideation among Korean Adults Using Machine Learning: A Cross-Sectional Study
    Oh, Bumjo
    Yun, Je-Yeon
    Yeo, Eun Chong
    Kim, Dong-Hoi
    Kim, Jin
    Cho, Bum-Joo
    PSYCHIATRY INVESTIGATION, 2020, 17 (04) : 331 - 340
  • [4] Machine learning algorithms to predict depression in older adults in China: a cross-sectional study
    Song, Yan Li Qing
    Chen, Lin
    Liu, Haoqiang
    Liu, Yue
    FRONTIERS IN PUBLIC HEALTH, 2025, 12
  • [5] Prediction and associated factors of hypothyroidism in systemic lupus erythematosus: a cross-sectional study based on multiple machine learning algorithms
    Huang, Ting
    Liu, Siyang
    Huang, Jian
    Li, Jiarong
    Liu, Guixiong
    Zhang, Weiru
    Wang, Xuan
    CURRENT MEDICAL RESEARCH AND OPINION, 2022, 38 (02) : 229 - 235
  • [6] Sarcopenia feature selection and risk prediction using machine learning A cross-sectional study
    Kang, Yang-Jae
    Yoo, Jun-Il
    Ha, Yong-chan
    MEDICINE, 2019, 98 (43)
  • [7] Using advanced machine learning algorithms to predict academic major completion: A cross-sectional study
    Kordbagheri, Alireza
    Kordbagheri, Mohammadreza
    Tayim, Natalie
    Fakhrou, Abdulnaser
    Davoudi, Mohammadreza
    Computers in Biology and Medicine, 2025, 184
  • [8] Machine Learning and Data Mining Methods in Diabetes Research
    Kavakiotis, Ioannis
    Tsave, Olga
    Salifoglou, Athanasios
    Maglaveras, Nicos
    Vlahavas, Ioannis
    Chouvarda, Ioanna
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2017, 15 : 104 - 116
  • [9] The Application of Machine Learning Algorithms in Data Mining
    Zhang, Wei
    2016 INTERNATIONAL CONFERENCE ON INFORMATION ENGINEERING AND COMMUNICATIONS TECHNOLOGY (IECT 2016), 2016, : 521 - 527
  • [10] Educational data mining: prediction of students' academic performance using machine learning algorithms
    Mustafa Yağcı
    Smart Learning Environments, 9