An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values

被引:10
|
作者
Roy, Kumarmangal [1 ]
Ahmad, Muneer [1 ]
Waqar, Kinza [1 ]
Priyaah, Kirthanaah [1 ]
Nebhen, Jamel [2 ]
Alshamrani, Sultan S. [3 ]
Raza, Muhammad Ahsan [4 ]
Ali, Ihsan [1 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur 50603, Malaysia
[2] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, POB 151, Alkharj 11942, Saudi Arabia
[3] Taif Univ, Dept Informat Technol, Coll Comp & Informat Technol, POB 11099, At Taif 21944, Saudi Arabia
[4] Bahauddin Zakariya Univ, Dept Informat Technol, Multan 60000, Pakistan
关键词
NEURAL-NETWORKS;
D O I
10.1155/2021/9953314
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Diabetes is one of the most common metabolic diseases that cause high blood sugar. Early diagnosis of such a condition is challenging due to its complex interdependence on various factors. There is a need to develop critical decision support systems to assist medical practitioners in the diagnosis process. This research proposes developing a predictive model that can achieve a high classification accuracy of type 2 diabetes. The study consisted of two fundamental parts. Firstly, the study investigated handling missing data adopting data imputation, namely, median value imputation, K-nearest neighbor imputation, and iterative imputation. Consequently, the study validated the implications of these imputations using various classification algorithms, i.e., linear, tree-based, and ensemble algorithms, to see how each method affected classification accuracy. Secondly, Artificial Neural Network was employed to model the best performing imputed data, balanced with SMOTETomek ensuring each class is represented fairly. This approach provided the best accuracy of 98% on the test data, outperforming accuracies achieved in prior studies using the same dataset. The dataset used in this study is concerned with gender and population. As a prospect, the study recommends adopting a larger population sample without geographic boundaries. Additionally, as the developed Artificial Neural Network model did not undergo any specific hyperparameter tuning, it would be interesting to explore tuning on top of normalized data to optimize accuracy further.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Using Machine Learning for the Risk Factors Classification of Glycemic Control in Type 2 Diabetes Mellitus
    Cheng, Yi-Ling
    Wu, Ying-Ru
    Lin, Kun-Der
    Lin, Chun-Hung Richard
    Lin, I-Mei
    HEALTHCARE, 2023, 11 (08)
  • [22] Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
    Azmi, Putri Azmira R.
    Yusoff, Marina
    Sallehud-din, Mohamad Taufik Mohd
    ENERGY REPORTS, 2025, 13 : 264 - 277
  • [23] Systematic Review of Using Machine Learning in Imputing Missing Values
    Alabadla, Mustafa
    Sidi, Fatimah
    Ishak, Iskandar
    Ibrahim, Hamidah
    Affendey, Lilly Suriani
    Ani, Zafienas Che
    Jabar, Marzanah A.
    Bukar, Umar Ali
    Devaraj, Navin Kumar
    Muda, Ahmad Sobri
    Tharek, Anas
    Omar, Noritah
    Jaya, M. Izham Mohd
    IEEE ACCESS, 2022, 10 : 44483 - 44502
  • [24] Classification of missing values in spatial data using spin models
    Zukovic, Milan
    Hristopulos, Dionissios T.
    PHYSICAL REVIEW E, 2009, 80 (01)
  • [25] Imputation of Missing Values in the Fundamental Data: Using MICE Framework
    Meghanadh, Balasubramaniam
    Aravalath, Lagesh
    Joshi, Bhupesh
    Sathiamoorthy, Raghunathan
    Kumar, Manish
    JOURNAL OF QUANTITATIVE ECONOMICS, 2019, 17 (03) : 459 - 475
  • [26] Imputation of Missing Values in the Fundamental Data: Using MICE Framework
    Balasubramaniam Meghanadh
    Lagesh Aravalath
    Bhupesh Joshi
    Raghunathan Sathiamoorthy
    Manish Kumar
    Journal of Quantitative Economics, 2019, 17 : 459 - 475
  • [27] Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
    Razzaghi, Talayeh
    Roderick, Oleg
    Safro, Ilya
    Marko, Nicholas
    PLOS ONE, 2016, 11 (05):
  • [28] Imbalanced data classification: Using transfer learning and active sampling
    Liu, Yang
    Yang, Guoping
    Qiao, Shaojie
    Liu, Meiqi
    Qu, Lulu
    Han, Nan
    Wu, Tao
    Yuan, Guan
    Peng, Yuzhong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 117
  • [29] Classification of Imbalanced Data Using Deep Learning with Adding Noise
    Fan, Wan-Wei
    Lee, Ching-Hung
    JOURNAL OF SENSORS, 2021, 2021 (2021)
  • [30] Classification of Diabetes Types using Machine Learning
    Adigun, Oyeranmi
    Oyeranm, Folasade
    Yekini, Nureni
    Babatunde, Ronke
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (09) : 152 - 161