A novel stacking ensemble for detecting three types of diabetes mellitus using a Saudi Arabian dataset: Pre-diabetes, T1DM, and T2DM

被引:34
作者
Gollapalli, Mohammed [1 ]
Alansari, Aisha [2 ]
Alkhorasani, Heba [2 ]
Alsubaii, Meelaf [2 ]
Sakloua, Rasha [2 ]
Alzahrani, Reem [2 ]
Al-Hariri, Mohammed [3 ]
Alfares, Maiadah [3 ]
AlKhafaji, Dania [4 ]
Al Argan, Reem [4 ]
Albaker, Waleed [4 ]
机构
[1] Imam Abdulrahman Bin Faisal Univ, Coll Comp Sci & Informat Technol, Dept Comp Informat Syst, POB 1982, Dammam 31441, Saudi Arabia
[2] Imam Abdulrahman Bin Faisal Univ, Coll Comp Sci & Informat Technol, Dept Comp Engn, POB 1982, Dammam 31441, Saudi Arabia
[3] Imam Abdulrahman Bin Faisal Univ, Coll Med, Dept Physiol, POB 1982, Dammam 31441, Saudi Arabia
[4] Imam Abdulrahman Bin Faisal Univ, King Fahad Hosp Univ, Dept Internal Med, Coll Med, Khobar, Saudi Arabia
关键词
Type1diabetes; Type2diabetes; Pre-diabetes; Machinelearning; Stacking; Permutationfeatureimportance;
D O I
10.1016/j.compbiomed.2022.105757
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Glucose is the primary source of energy for cells, which are the building blocks of life. It is given to the body by insulin that carries out the metabolic tasks that keep people alive. Glucose level imbalance is a sign of diabetes mellitus (DM), a common type of chronic disease. It leads to long-term complications, such as blindness, kidney failure, and heart disease, having a negative impact on one's quality of life. In Saudi Arabia, a ten-fold increase in diabetic cases has been documented within the last three years. DM is broadly categorized as Type 1 Diabetes (T1DM), Type 2 Diabetes (T2DM), and Pre-diabetes. The diagnosis of the correct type is sometimes ambiguous to medical professionals causing difficulties in managing the illness progression. Intensive efforts have been made to predict T2DM. However, there is a lack of studies focusing on accurately identifying T1DM and Pre-diabetes. Therefore, this study aims to utilize Machine Learning (ML) to distinguish and predict the three types of diabetes based on a Saudi Arabian hospital dataset to control their progression. Four different experiments have been conducted to achieve the highest results, where several algorithms were used, including Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (K-NN), Decision Tree (DT), Bagging, and Stacking. In experiments 2, 3, and 4, the Synthetic Minority Oversampling Technique (SMOTE) was applied to balance the dataset. The empirical results demonstrated promising results of the novel Stacking model that combined Bagging K-NN, Bagging DT, and K-NN, with a K-NN meta-classifier attaining an accuracy, weighted recall, weighted precision, and cohen's kappa score of 94.48%, 94.48%, 94.70%, and 0.9172, respectively. Five principal features were identified to significantly affect the model accuracy using the permutation feature importance, namely Education, AntiDiab, Insulin, Nutrition, and Sex.
引用
收藏
页数:12
相关论文
共 50 条
[1]   Investigating Health-Related Features and Their Impact on the Prediction of Diabetes Using Machine Learning [J].
Ahmad, Hafiz Farooq ;
Mukhtar, Hamid ;
Alaqail, Hesham ;
Seliaman, Mohamed ;
Alhumam, Abdulaziz .
APPLIED SCIENCES-BASEL, 2021, 11 (03) :1-18
[2]   Diabetes Mellitus in Saudi Arabia: A Review of the Recent Literature [J].
Al Dawish, Mohamed Abdulaziz ;
Robert, Asirvatham Alwin ;
Braham, Rim ;
Al Hayek, Ayman Abdallah ;
Al Saeed, Abdulghani ;
Ahmed, Rania Ahmed ;
Al Sabaan, Fahad Sulaiman .
CURRENT DIABETES REVIEWS, 2016, 12 (04) :359-368
[3]   Diagnosis of diabetes using machine learning algorithms [J].
Alaa Khaleel F. ;
Al-Bakry A.M. .
Materials Today: Proceedings, 2023, 80 :3200-3203
[4]  
Andoh T, 2016, HANDBOOK OF HORMONES: COMPARATIVE ENDOCRINOLOGY FOR BASIC AND CLINICAL RESEARCH, P157, DOI 10.1016/B978-0-12-801028-0.00148-3
[5]  
[Anonymous], 1985, WHO TECH REP SER, P1
[6]  
[Anonymous], Permutation feature importance - scikit-learn 1.1.2 documentation
[7]   The impact of chronic diseases - The partner's perspective [J].
Baanders, Arianne N. ;
Heijmans, Monique J. W. M. .
FAMILY & COMMUNITY HEALTH, 2007, 30 (04) :305-317
[8]   Implementing Artificial Intelligence in H-BIM Using the J48 Algorithm to Manage Historic Buildings [J].
Bienvenido-Huertas, David ;
Enrique Nieto-Julian, Juan ;
Jose Moyano, Juan ;
Manuel Macias-Bernal, Juan ;
Castro, Jose .
INTERNATIONAL JOURNAL OF ARCHITECTURAL HERITAGE, 2020, 14 (08) :1148-1160
[9]   Distinguishing between type 1 and type 2 diabetes [J].
Butler, Alexandra E. ;
Misselbrook, David .
BMJ-BRITISH MEDICAL JOURNAL, 2020, 370
[10]  
Chaurasia V., 2021, SN Comput Sci, V2, P1, DOI [DOI 10.1007/S42979-021-00465-3, 10.1007/S42979-021-00465-3]