A Robust Machine Learning Framework for Diabetes Prediction

被引:0
|
作者
Olisah, Chollette [1 ]
Adeleye, Oluwaseun [2 ]
Smith, Lyndon [1 ]
Smith, Melvyn [1 ]
机构
[1] Univ West England, Ctr Machine Vis, Bristol Robot Lab, Bristol, Avon, England
[2] Baze Univ, Dept Comp Sci, Abuja, Nigeria
来源
PROCEEDINGS OF THE FUTURE TECHNOLOGIES CONFERENCE (FTC) 2021, VOL 2 | 2022年 / 359卷
关键词
Diabetes mellitus; Spearman correlation; Polynomial regression; Random forest; Classification; Machine learning; PIMA Indian; IMPUTATION; TREES;
D O I
10.1007/978-3-030-89880-9_58
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diabetes mellitus is a metabolic disorder characterized by hyperglycemia which results from the inadequacy of the body to secret and responds to insulin. If not properly managed or diagnosed on time, diabetes can pose a risk to vital body organs such as the eyes, kidneys, nerves, heart, and blood vessels and can be life-threatening. From the many years of research in computational diagnosis of diabetes, machine learning has been proven to be a viable solution for the prediction of diabetes. However, the accuracy rate to date suggests that there is still much room for improvement. In this paper, we are proposing a machine learning framework to improve the performance of diabetes prediction with the PIMA Indian dataset. Through analysis, we observe that the main challenges of the dataset, which flaws learning, are feature selection and missing values. For each of these challenges, we propose a working solution that incorporates, Spearman Correlation and polynomial regression from a new perspective. Further, we optimize the random forest classifier by tuning its hyperparameters using grid search and repeated stratified k-fold cross-validation to build a robust random forest model that scales to the prediction problem. Finally, through exhaustive experiments, we demonstrate that our proposed data preparation approaches lead to a robust machine learning framework for the diagnosis of diabetes mellitus with train accuracy, and test-accuracy values that range from 98.96% to 100% and 97.92% to 100%, respectively, which outperforms all the state-of-the-art results. The source code for the proposed machine learning framework is made publicly available.
引用
收藏
页码:775 / 792
页数:18
相关论文
共 50 条
  • [31] Prediction of fatty liver disease using machine learning algorithms
    Wu, Chieh-Chen
    Yeh, Wen-Chun
    Hsu, Wen-Ding
    Islam, Md. Mohaimenul
    Phung Anh Nguyen
    Poly, Tahmina Nasrin
    Wang, Yao-Chin
    Yang, Hsuan-Chia
    Li, Yu-Chuan
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2019, 170 : 23 - 29
  • [32] Prediction of Diabetes Empowered With Fused Machine Learning
    Ahmed, Usama
    Issa, Ghassan F.
    Khan, Muhammad Adnan
    Aftab, Shabib
    Khan, Muhammad Farhan
    Said, Raed A. T.
    Ghazal, Taher M.
    Ahmad, Munir
    IEEE ACCESS, 2022, 10 : 8529 - 8538
  • [33] Diabetes mellitus risk prediction in the presence of class imbalance using flexible machine learning methods
    Sadeghi, Somayeh
    Khalili, Davood
    Ramezankhani, Azra
    Mansournia, Mohammad Ali
    Parsaeian, Mahboubeh
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
  • [34] Machine learning models and bankruptcy prediction
    Barboza, Flavio
    Kimura, Herbert
    Altman, Edward
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 83 : 405 - 417
  • [35] Development of Various Diabetes Prediction Models Using Machine Learning Techniques
    Shin, Juyoung
    Kim, Jaewon
    Lee, Chanjung
    Yoon, Joon Young
    Kim, Seyeon
    Song, Seungjae
    Kim, Hun-Sung
    DIABETES & METABOLISM JOURNAL, 2022, 46 (04) : 650 - 657
  • [36] A highly accurate and robust prediction framework for drilling rate of penetration based on machine learning ensemble algorithm
    Yang, Yuxiang
    Cen, Xiao
    Ni, Haocheng
    Liu, Yibing
    Chen, Zhangxing John
    Yang, Jin
    Hong, Bingyuan
    GEOENERGY SCIENCE AND ENGINEERING, 2025, 244
  • [37] Machine Learning Analysis in the Prediction of Diabetes Mellitus: A Systematic Review of the Literature
    Marres-Salhuana, Marieta
    Garcia-Rios, Victor
    Cabanillas-Carbonell, Michael
    PROCEEDINGS OF SEVENTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, ICICT 2022, VOL. 2, 2023, 448 : 351 - 361
  • [38] Wellness Prediction in Diabetes Mellitus Risks Via Machine Learning Classifiers
    Saravanakumar, Venkatesh M.
    Sabibullah, M.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (04): : 203 - 208
  • [39] An Extensive Survey on Recent Machine Learning Algorithms for Diabetes Mellitus Prediction
    Selvi, R. Thanga
    Muthulakshmi, I
    INTELLIGENT COMMUNICATION TECHNOLOGIES AND VIRTUAL MOBILE NETWORKS, ICICV 2019, 2020, 33 : 328 - 335
  • [40] A Fusion-Based Machine Learning Approach for the Prediction of the Onset of Diabetes
    Nadeem, Muhammad Waqas
    Goh, Hock Guan
    Ponnusamy, Vasaki
    Andonovic, Ivan
    Khan, Muhammad Adnan
    Hussain, Muzammil
    HEALTHCARE, 2021, 9 (10)