An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values

被引:10
|
作者
Roy, Kumarmangal [1 ]
Ahmad, Muneer [1 ]
Waqar, Kinza [1 ]
Priyaah, Kirthanaah [1 ]
Nebhen, Jamel [2 ]
Alshamrani, Sultan S. [3 ]
Raza, Muhammad Ahsan [4 ]
Ali, Ihsan [1 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur 50603, Malaysia
[2] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, POB 151, Alkharj 11942, Saudi Arabia
[3] Taif Univ, Dept Informat Technol, Coll Comp & Informat Technol, POB 11099, At Taif 21944, Saudi Arabia
[4] Bahauddin Zakariya Univ, Dept Informat Technol, Multan 60000, Pakistan
关键词
NEURAL-NETWORKS;
D O I
10.1155/2021/9953314
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Diabetes is one of the most common metabolic diseases that cause high blood sugar. Early diagnosis of such a condition is challenging due to its complex interdependence on various factors. There is a need to develop critical decision support systems to assist medical practitioners in the diagnosis process. This research proposes developing a predictive model that can achieve a high classification accuracy of type 2 diabetes. The study consisted of two fundamental parts. Firstly, the study investigated handling missing data adopting data imputation, namely, median value imputation, K-nearest neighbor imputation, and iterative imputation. Consequently, the study validated the implications of these imputations using various classification algorithms, i.e., linear, tree-based, and ensemble algorithms, to see how each method affected classification accuracy. Secondly, Artificial Neural Network was employed to model the best performing imputed data, balanced with SMOTETomek ensuring each class is represented fairly. This approach provided the best accuracy of 98% on the test data, outperforming accuracies achieved in prior studies using the same dataset. The dataset used in this study is concerned with gender and population. As a prospect, the study recommends adopting a larger population sample without geographic boundaries. Additionally, as the developed Artificial Neural Network model did not undergo any specific hyperparameter tuning, it would be interesting to explore tuning on top of normalized data to optimize accuracy further.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] A Comprehensive Machine Learning Approach for Early Detection of Diabetes on Imbalanced Data with Missing and Outlier Values
    Yogendra Singh
    Mahendra Tiwari
    SN Computer Science, 6 (3)
  • [2] Robust predictive framework for diabetes classification using optimized machine learning on imbalanced datasets
    Abousaber, Inam
    Abdallah, Haitham F.
    El-Ghaish, Hany
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2025, 7
  • [3] Fast Imbalanced Classification of Healthcare Data with Missing Values
    Razzaghi, Talayeh
    Roderick, Oleg
    Safro, Ilya
    Marko, Nick
    2015 18TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2015, : 774 - 781
  • [4] DMP_MI: An Effective Diabetes Mellitus Classification Algorithm on Imbalanced Data With Missing Values
    Wang, Qian
    Cao, Weijia
    Guo, Jiawei
    Ren, Jiadong
    Cheng, Yongqiang
    Davis, Darryl N.
    IEEE ACCESS, 2019, 7 : 102232 - 102238
  • [5] Prediction of Type 2 Diabetes using Machine Learning Classification Methods
    Tigga, Neha Prerna
    Garg, Shruti
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 706 - 716
  • [6] Effective Handling of Missing Values in Datasets for Classification Using Machine Learning Methods
    Palanivinayagam, Ashokkumar
    Damasevicius, Robertas
    INFORMATION, 2023, 14 (02)
  • [7] An Improved Extreme Learning Machine for Imbalanced Data Classification
    Zhang, Xiaopeng
    Qin, Liangxi
    IEEE ACCESS, 2022, 10 : 8634 - 8642
  • [8] Type 2 Diabetes Mellitus: Early Detection using Machine Learning Classification
    Gowthami, S.
    Reddy, Venkata Siva
    Ahmed, Mohammed Riyaz
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 1191 - 1198
  • [9] Imbalanced Classification in Diabetics Using Ensembled Machine Learning
    Kumar, M. Sandeep
    Khan, Mohammad Zubair
    Rajendran, Sukumar
    Noor, Ayman
    Dass, A. Stephen
    Prabhu, J.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 72 (03): : 4397 - 4409
  • [10] IMBALANCED DATA CLASSIFICATION BASED ON EXTREME LEARNING MACHINE AUTOENCODER
    Shen, Chu
    Zhang, Su-Fang
    Zhai, Jun-Hal
    Luo, Ding-Sheng
    Chen, Jun-Fen
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 2, 2018, : 399 - 404