A Robust Machine Learning Framework for Diabetes Prediction

被引:0
|
作者
Olisah, Chollette [1 ]
Adeleye, Oluwaseun [2 ]
Smith, Lyndon [1 ]
Smith, Melvyn [1 ]
机构
[1] Univ West England, Ctr Machine Vis, Bristol Robot Lab, Bristol, Avon, England
[2] Baze Univ, Dept Comp Sci, Abuja, Nigeria
来源
PROCEEDINGS OF THE FUTURE TECHNOLOGIES CONFERENCE (FTC) 2021, VOL 2 | 2022年 / 359卷
关键词
Diabetes mellitus; Spearman correlation; Polynomial regression; Random forest; Classification; Machine learning; PIMA Indian; IMPUTATION; TREES;
D O I
10.1007/978-3-030-89880-9_58
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diabetes mellitus is a metabolic disorder characterized by hyperglycemia which results from the inadequacy of the body to secret and responds to insulin. If not properly managed or diagnosed on time, diabetes can pose a risk to vital body organs such as the eyes, kidneys, nerves, heart, and blood vessels and can be life-threatening. From the many years of research in computational diagnosis of diabetes, machine learning has been proven to be a viable solution for the prediction of diabetes. However, the accuracy rate to date suggests that there is still much room for improvement. In this paper, we are proposing a machine learning framework to improve the performance of diabetes prediction with the PIMA Indian dataset. Through analysis, we observe that the main challenges of the dataset, which flaws learning, are feature selection and missing values. For each of these challenges, we propose a working solution that incorporates, Spearman Correlation and polynomial regression from a new perspective. Further, we optimize the random forest classifier by tuning its hyperparameters using grid search and repeated stratified k-fold cross-validation to build a robust random forest model that scales to the prediction problem. Finally, through exhaustive experiments, we demonstrate that our proposed data preparation approaches lead to a robust machine learning framework for the diagnosis of diabetes mellitus with train accuracy, and test-accuracy values that range from 98.96% to 100% and 97.92% to 100%, respectively, which outperforms all the state-of-the-art results. The source code for the proposed machine learning framework is made publicly available.
引用
收藏
页码:775 / 792
页数:18
相关论文
共 50 条
  • [41] Enhancing Healthcare: Machine Learning for Diabetes Prediction and Retinopathy Risk Evaluation
    Barakat, Ghinwa
    Hassan, Samer El Hajj
    Duong-Trung, Nghia
    Ramadan, Wiam
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (07) : 18 - 36
  • [42] Diabetes Disease Prediction using Machine Learning on Big Data of Healthcare
    Mir, Ayman
    Dhage, Sudhir N.
    2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [43] An integrated machine learning framework for hospital readmission prediction
    Jiang, Shancheng
    Chin, Kwai-Sang
    Qu, Gang
    Tsui, Kwok L.
    KNOWLEDGE-BASED SYSTEMS, 2018, 146 : 73 - 90
  • [44] A Machine Learning Framework towards Bank Telemarketing Prediction
    Tekouabou, Stephane Cedric Koumetio
    Gherghina, Stefan Cristian
    Toulni, Hamza
    Neves Mata, Pedro
    Mata, Mario Nuno
    Martins, Jose Moleiro
    JOURNAL OF RISK AND FINANCIAL MANAGEMENT, 2022, 15 (06)
  • [45] A Machine Learning Framework for Olive Farms Profit Prediction
    Christias, Panagiotis
    Mocanu, Mariana
    WATER, 2021, 13 (23)
  • [46] A Framework for Airfare Price Prediction: A Machine Learning Approach
    Wang, Tianyi
    Pouyanfar, Samira
    Tian, Haiman
    Tao, Yudong
    Alonso, Miguel
    Luis, Steven
    Chen, Shu-Ching
    2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 200 - 207
  • [47] IoT and Interpretable Machine Learning Based Framework for Disease Prediction in Pearl Millet
    Kundu, Nidhi
    Rani, Geeta
    Dhaka, Vijaypal Singh
    Gupta, Kalpit
    Nayak, Siddaiah Chandra
    Verma, Sahil
    Ijaz, Muhammad Fazal
    Wozniak, Marcin
    SENSORS, 2021, 21 (16)
  • [48] Ensemble Machine Learning Framework for Accurate Flood Prediction
    Varghese, Akanksha
    Gupta, Vijay Baboo
    Saxena, Mayank
    10TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTING AND COMMUNICATION TECHNOLOGIES, CONECCT 2024, 2024,
  • [49] An Efficient Machine Learning Prediction Method for Vehicle Detection: Data Analytics Framework
    Surbakti, Herison
    Fusiripong, Prashaya
    TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2024, 13 (01): : 16 - 25
  • [50] Prediction of Diabetes Using Machine Learning Algorithms in Healthcare
    Sarwar, Muhammad Azeem
    Kamal, Nasir
    Hamid, Wajeeha
    Shah, Munam Ali
    2018 24TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND COMPUTING (ICAC' 18), 2018, : 247 - 252