A Robust Machine Learning Framework for Diabetes Prediction

被引:0
|
作者
Olisah, Chollette [1 ]
Adeleye, Oluwaseun [2 ]
Smith, Lyndon [1 ]
Smith, Melvyn [1 ]
机构
[1] Univ West England, Ctr Machine Vis, Bristol Robot Lab, Bristol, Avon, England
[2] Baze Univ, Dept Comp Sci, Abuja, Nigeria
来源
PROCEEDINGS OF THE FUTURE TECHNOLOGIES CONFERENCE (FTC) 2021, VOL 2 | 2022年 / 359卷
关键词
Diabetes mellitus; Spearman correlation; Polynomial regression; Random forest; Classification; Machine learning; PIMA Indian; IMPUTATION; TREES;
D O I
10.1007/978-3-030-89880-9_58
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diabetes mellitus is a metabolic disorder characterized by hyperglycemia which results from the inadequacy of the body to secret and responds to insulin. If not properly managed or diagnosed on time, diabetes can pose a risk to vital body organs such as the eyes, kidneys, nerves, heart, and blood vessels and can be life-threatening. From the many years of research in computational diagnosis of diabetes, machine learning has been proven to be a viable solution for the prediction of diabetes. However, the accuracy rate to date suggests that there is still much room for improvement. In this paper, we are proposing a machine learning framework to improve the performance of diabetes prediction with the PIMA Indian dataset. Through analysis, we observe that the main challenges of the dataset, which flaws learning, are feature selection and missing values. For each of these challenges, we propose a working solution that incorporates, Spearman Correlation and polynomial regression from a new perspective. Further, we optimize the random forest classifier by tuning its hyperparameters using grid search and repeated stratified k-fold cross-validation to build a robust random forest model that scales to the prediction problem. Finally, through exhaustive experiments, we demonstrate that our proposed data preparation approaches lead to a robust machine learning framework for the diagnosis of diabetes mellitus with train accuracy, and test-accuracy values that range from 98.96% to 100% and 97.92% to 100%, respectively, which outperforms all the state-of-the-art results. The source code for the proposed machine learning framework is made publicly available.
引用
收藏
页码:775 / 792
页数:18
相关论文
共 50 条
  • [1] Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective
    Olisah, Chollette C.
    Smith, Lyndon
    Smith, Melvyn
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 220
  • [2] Machine Learning Based Unified Framework for Diabetes Prediction
    Mahmud, S. M. Hasan
    Hossin, Md Altab
    Ahmed, Md Razu
    Noori, Sheak Rashed Haider
    Sarkar, Md Nazirul Islam
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON BIG DATA ENGINEERING AND TECHNOLOGY (BDET 2018), 2018, : 46 - 50
  • [3] The Prediction of Diabetes Development: A Machine Learning Framework
    Islam, Md Shafiqul
    Qaraqe, Marwa K.
    Abbas, Hasan T.
    Erraguntla, Madhav
    Abdul-Ghani, Muhammad
    2020 IEEE 5TH MIDDLE EAST AND AFRICA CONFERENCE ON BIOMEDICAL ENGINEERING (MECBME), 2020, : 154 - 159
  • [4] A robust voting approach for diabetes prediction using traditional machine learning techniques
    Mahabub, Atik
    SN APPLIED SCIENCES, 2019, 1 (12):
  • [5] Diabetes Prediction using Machine Learning
    Kharkwal, Tarun
    Meena, Shweta
    INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION, 2022, 14 (02) : 6999 - 7005
  • [6] Classification and prediction of diabetes disease using machine learning paradigm
    Maniruzzaman, Md.
    Rahman, Md. Jahanur
    Ahammed, Benojir
    Abedin, Md. Menhazul
    HEALTH INFORMATION SCIENCE AND SYSTEMS, 2020, 8 (01)
  • [7] Classification and prediction of diabetes disease using machine learning paradigm
    Md. Maniruzzaman
    Md. Jahanur Rahman
    Benojir Ahammed
    Md. Menhazul Abedin
    Health Information Science and Systems, 8
  • [8] IDMPF: intelligent diabetes mellitus prediction framework using machine learning
    Ismail, Leila
    Materwala, Huned
    APPLIED COMPUTING AND INFORMATICS, 2025, 21 (1/2) : 78 - 89
  • [9] Diabetes Prediction using SMOTE and Machine Learning
    Sarayu, Maganti Khyathi
    Bhanu, Shaik Ayesha
    Deekshitha, Karanam
    Meghana, Maduri
    Joseph, Iwin Thanakumar
    2024 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS, ICICI 2024, 2024, : 15 - 20
  • [10] Comparison of Machine Learning Algorithms for Prediction of Diabetes
    Costea, Naomi Estera
    Moisi, Elisa Valentina
    Popescu, Daniela Elena
    2021 16TH INTERNATIONAL CONFERENCE ON ENGINEERING OF MODERN ELECTRIC SYSTEMS (EMES), 2021, : 56 - 59