A Machine Learning Approach to Predicting Diabetes Complications

被引:27
作者
Jian, Yazan [1 ]
Pasquier, Michel [1 ]
Sagahyroon, Assim [1 ]
Aloul, Fadi [1 ]
机构
[1] Amer Univ Sharjah, Dept Comp Sci & Engn, Sharjah 26666, U Arab Emirates
关键词
diabetes prediction; diabetes complications; supervised learning;
D O I
10.3390/healthcare9121712
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Diabetes mellitus (DM) is a chronic disease that is considered to be life-threatening. It can affect any part of the body over time, resulting in serious complications such as nephropathy, neuropathy, and retinopathy. In this work, several supervised classification algorithms were applied for building different models to predict and classify eight diabetes complications. The complications include metabolic syndrome, dyslipidemia, neuropathy, nephropathy, diabetic foot, hypertension, obesity, and retinopathy. For this study, a dataset collected by the Rashid Center for Diabetes and Research (RCDR) located in Ajman, UAE, was utilized. The dataset consists of 884 records with 79 features. Some essential preprocessing steps were applied to handle the missing values and unbalanced data problems. Furthermore, feature selection was performed to select the top five and ten features for each complication. The final number of records used to train and build the binary classifiers for each complication was as follows: 428-metabolic syndrome, 836-dyslipidemia, 223-neuropathy, 233-nephropathy, 240-diabetic foot, 586-hypertension, 498-obesity, 228-retinopathy. Repeated stratified k-fold cross-validation (with k = 10 and a total of 10 repetitions) was employed for a better estimation of the performance. Accuracy and F1-score were used to evaluate the models' performance reaching a maximum of 97.8% and 97.7% for accuracy and F1-scores, respectively. Moreover, by comparing the performance achieved using different attributes' sets, it was found that by using a selected number of features, we can still build adequate classifiers.
引用
收藏
页数:19
相关论文
共 35 条
[1]  
Abdulhadi Nour, 2021, 2021 International Conference on Information Technology (ICIT), P350, DOI 10.1109/ICIT52682.2021.9491788
[2]  
Alam M.A., 2018, PREDICTION DIABETES
[3]  
[Anonymous], 2019, 20 POPULAR MACHINE 1
[4]  
[Anonymous], Diabetic retinopathy
[5]  
[Anonymous], 2019, Diabetes
[6]   A survey of cross-validation procedures for model selection [J].
Arlot, Sylvain ;
Celisse, Alain .
STATISTICS SURVEYS, 2010, 4 :40-79
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Diabetes-Related Microvascular and Macrovascular Diseases in the Physical Therapy Setting [J].
Cade, W. Todd .
PHYSICAL THERAPY, 2008, 88 (11) :1322-1335
[9]  
Centers for Disease Control and Prevention, DIAB NERV DAM FEAT L
[10]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)