Comparison of Statistical Logistic Regression and RandomForest Machine Learning Techniques in Predicting Diabetes

被引:35
作者
Daghistani, Tahani [1 ]
Alshammari, Riyad [1 ]
机构
[1] King Saud Bin Abdulaziz Univ Hlth Sci KSAU HS, King Abdullah Int Med Res Ctr KAIMRC, Coll Publ Hlth & Hlth Informat, Hlth Informat Dept,Minist Natl Guard Hlth Affairs, Riyadh, Saudi Arabia
关键词
diabetes; predictive model; machine learning; RandomForest; logistic regression;
D O I
10.12720/jait.11.2.78-83
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes is one of the global concerns in the healthcare domain and one of the leading challenges locally in Saudi Arabia. The prevalence of diabetes is anticipated to rise; early prediction of individuals at high risk of diabetes is a significant challenge. This study aims to compare RandomForest machine learning algorithm and Logistic Regression algorithm towards the prediction of diabetes. We analyzed 66,325 records that extracted from the Ministry of National Guard Hospital Affairs (MNGHA) databases in Saudi Arabia between 2013 and 2015. Both Machine Learning algorithms were applied to predict diabetes based on 18 risk factors. The evaluation criteria to compare the two algorithms were based on precision, Recall, True Positive rate, False Negative rate, F-measure and Area under the curve. The overall prevalence of diabetes in the data set is 64.47%. Male represents 55.50% of the data set while female represents 44.50%. For RandomForest (RF) model, the precision, Recall, True Positive Rate, False Positive Rate and F-measure value for predicting diabetes were 0.883, 0.88, 0.88, 0.188 and 0.876, respectively, while Logistic Regression model were only 0.692, 0.703, 0.703,0.454 and 0.675, respectively. Area under the ROC curve (AUC) value was 0.944 for the RF model and 0.708 for Logistic Regression model, which demonstrates higher predictive performance for RF than the Logistic Regression model. The RF algorithm showed superior prediction performance over Logistic Regression technique in predicting diabetes based on various matrices.
引用
收藏
页码:78 / 83
页数:6
相关论文
共 50 条
  • [41] Combining Logistic Regression Analysis with Data Mining Techniques to Predict Diabetes
    Paisanwarakiat, Ratchaneewan
    Na-udom, Anamai
    Rungrattanaubol, Jaratsri
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON COMPUTING AND INFORMATION TECHNOLOGY (IC2IT 2022), 2022, 453 : 88 - 98
  • [42] Comparison of Conventional Logistic Regression and Machine Learning Methods for Predicting Delayed Cerebral Ischemia After Aneurysmal Subarachnoid Hemorrhage: A Multicentric Observational Cohort Study
    Hu, Ping
    Li, Yuntao
    Liu, Yangfan
    Guo, Geng
    Gao, Xu
    Su, Zhongzhou
    Wang, Long
    Deng, Gang
    Yang, Shuang
    Qi, Yangzhi
    Xu, Yang
    Ye, Liguo
    Sun, Qian
    Nie, Xiaohu
    Sun, Yanqi
    Li, Mingchang
    Zhang, Hongbo
    Chen, Qianxue
    FRONTIERS IN AGING NEUROSCIENCE, 2022, 14
  • [43] NONPARAMETRIC STATISTICAL ANALYSIS FOR MULTIPLE COMPARISON OF MACHINE LEARNING REGRESSION ALGORITHMS
    Trawinski, Bogdan
    Smetek, Magdalena
    Telec, Zbigniew
    Lasota, Tadeusz
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2012, 22 (04) : 867 - 881
  • [44] Genetic biomarkers and machine learning techniques for predicting diabetes: systematic review
    Khan, Sulaiman
    Mohsen, Farida
    Shah, Zubair
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 58 (02)
  • [45] Comparison of Machine Learning Techniques on Twitter Emotions Classification
    S. Santhosh Baboo
    M. Amirthapriya
    SN Computer Science, 2022, 3 (1)
  • [46] Predicting Diabetes Disease Occurrence Using Logistic Regression: An Early Detection Approach
    Abdalrada A.S.
    Neamah A.F.
    Murad H.
    Iraqi Journal for Computer Science and Mathematics, 2024, 5 (01): : 160 - 167
  • [47] Comparison between Machine Learning Algorithms in the Predicting the Onset of Diabetes
    Abed, Mahmood
    Ibrikci, Turgay
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [48] Comparison Of Statistical Tests In Logistic Regression: The Case Of Hypernatreamia
    Katsaragakis, Stylianos
    Koukouvinos, Christos
    Stylianou, Stella
    Theodoraki, Eleni-Maria
    JOURNAL OF MODERN APPLIED STATISTICAL METHODS, 2005, 4 (02) : 514 - 521
  • [49] A Survey on Medical Diagnosis of Diabetes Using Machine Learning Techniques
    Choudhury, Ambika
    Gupta, Deepak
    RECENT DEVELOPMENTS IN MACHINE LEARNING AND DATA ANALYTICS, 2019, 740 : 67 - 78
  • [50] Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches
    Stylianou, Neophytos
    Akbarov, Artur
    Kontopantelis, Evangelos
    Buchan, Iain
    Dunn, Ken W.
    BURNS, 2015, 41 (05) : 925 - 934