Comparison of Statistical Logistic Regression and RandomForest Machine Learning Techniques in Predicting Diabetes

被引:35
作者
Daghistani, Tahani [1 ]
Alshammari, Riyad [1 ]
机构
[1] King Saud Bin Abdulaziz Univ Hlth Sci KSAU HS, King Abdullah Int Med Res Ctr KAIMRC, Coll Publ Hlth & Hlth Informat, Hlth Informat Dept,Minist Natl Guard Hlth Affairs, Riyadh, Saudi Arabia
关键词
diabetes; predictive model; machine learning; RandomForest; logistic regression;
D O I
10.12720/jait.11.2.78-83
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes is one of the global concerns in the healthcare domain and one of the leading challenges locally in Saudi Arabia. The prevalence of diabetes is anticipated to rise; early prediction of individuals at high risk of diabetes is a significant challenge. This study aims to compare RandomForest machine learning algorithm and Logistic Regression algorithm towards the prediction of diabetes. We analyzed 66,325 records that extracted from the Ministry of National Guard Hospital Affairs (MNGHA) databases in Saudi Arabia between 2013 and 2015. Both Machine Learning algorithms were applied to predict diabetes based on 18 risk factors. The evaluation criteria to compare the two algorithms were based on precision, Recall, True Positive rate, False Negative rate, F-measure and Area under the curve. The overall prevalence of diabetes in the data set is 64.47%. Male represents 55.50% of the data set while female represents 44.50%. For RandomForest (RF) model, the precision, Recall, True Positive Rate, False Positive Rate and F-measure value for predicting diabetes were 0.883, 0.88, 0.88, 0.188 and 0.876, respectively, while Logistic Regression model were only 0.692, 0.703, 0.703,0.454 and 0.675, respectively. Area under the ROC curve (AUC) value was 0.944 for the RF model and 0.708 for Logistic Regression model, which demonstrates higher predictive performance for RF than the Logistic Regression model. The RF algorithm showed superior prediction performance over Logistic Regression technique in predicting diabetes based on various matrices.
引用
收藏
页码:78 / 83
页数:6
相关论文
共 50 条
  • [31] A COMPARISON OF LOGISTIC REGRESSION AND MACHINE LEARNING ALGORITHMS APPLIED TO ZERO COUNTS DATA IN CONTINGENCY TABLES
    Dureh, Nurin
    Tongkumchum, Phattrawan
    ADVANCES AND APPLICATIONS IN STATISTICS, 2019, 55 (01) : 67 - 76
  • [32] The diabacare cloud: predicting diabetes using machine learning
    Alam, Mehtab
    Khan, Ihtiram Raza
    Alam, Mohammad Afshar
    Siddiqui, Farheen
    Tanweer, Safdar
    ACTA SCIENTIARUM-TECHNOLOGY, 2024, 46 (01)
  • [33] Comparison between logistic regression and machine learning algorithms on survival prediction of traumatic brain injuries
    Feng, Jin-zhou
    Wang, Yu
    Peng, Jin
    Sun, Ming-wei
    Zeng, Jun
    Jiang, Hua
    JOURNAL OF CRITICAL CARE, 2019, 54 : 110 - 116
  • [34] Fraud Prediction in Smart Societies Using Logistic Regression and k-fold Machine Learning Techniques
    Kamta Nath Mishra
    Subhash Chandra Pandey
    Wireless Personal Communications, 2021, 119 : 1341 - 1367
  • [35] Fraud Prediction in Smart Societies Using Logistic Regression and k-fold Machine Learning Techniques
    Mishra, Kamta Nath
    Pandey, Subhash Chandra
    WIRELESS PERSONAL COMMUNICATIONS, 2021, 119 (02) : 1341 - 1367
  • [36] Using a cohort study of diabetes and peripheral artery disease to compare logistic regression and machine learning via random forest modeling
    Andrea M. Austin
    Niveditta Ramkumar
    Barbara Gladders
    Jonathan A. Barnes
    Mark A. Eid
    Kayla O. Moore
    Mark W. Feinberg
    Mark A. Creager
    Marc Bonaca
    Philip P. Goodney
    BMC Medical Research Methodology, 22
  • [37] Using a cohort study of diabetes and peripheral artery disease to compare logistic regression and machine learning via random forest modeling
    Austin, Andrea M.
    Ramkumar, Niveditta
    Gladders, Barbara
    Barnes, Jonathan A.
    Eid, Mark A.
    Moore, Kayla O.
    Feinberg, Mark W.
    Creager, Mark A.
    Bonaca, Marc
    Goodney, Philip P.
    BMC MEDICAL RESEARCH METHODOLOGY, 2022, 22 (01)
  • [38] Comparison of multiple linear regression and machine learning methods in predicting cognitive function in older Chinese type 2 diabetes patients
    Liu, Chi-Hao
    Peng, Chung-Hsin
    Huang, Li-Ying
    Chen, Fang-Yu
    Kuo, Chun-Heng
    Wu, Chung-Ze
    Cheng, Yu-Fang
    BMC NEUROLOGY, 2024, 24 (01)
  • [39] Logistic Regression for Machine Learning in Process Tomography
    Rymarczyk, Tomasz
    Kozlowski, Edward
    Klosowski, Grzegorz
    Niderla, Konrad
    SENSORS, 2019, 19 (15)
  • [40] Comparison of Statistical and Machine Learning Techniques for Physical Layer Authentication
    Senigagliesi, Linda
    Baldi, Marco
    Gambi, Ennio
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2021, 16 : 1506 - 1521