Chi-Square and PCA Based Feature Selection for Diabetes Detection with Ensemble Classifier

被引:9
作者
Rupapara, Vaibhav [1 ]
Rustam, Furqan [2 ]
Ishaq, Abid [2 ]
Lee, Ernesto [3 ]
Ashraf, Imran [4 ]
机构
[1] Florida Int Univ, Sch Comp & Informat Sci, Miami, FL USA
[2] Khwaja Fareed Univ Engn & Informat Technol, Dept Comp Sci, Rahim Yar Khan 64200, Pakistan
[3] Broward Coll, Dept Comp Sci, Broward Cty, FL USA
[4] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan 38541, South Korea
关键词
Diabetes mellitus prediction; feature fusion; ensemble classifier; principal component analysis; chi-square;
D O I
10.32604/iasc.2023.028257
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes mellitus is a metabolic disease that is ranked among the top 10 causes of death by the world health organization. During the last few years, an alarming increase is observed worldwide with a 70% rise in the disease since 2000 and an 80% rise in male deaths. If untreated, it results in complications of many vital organs of the human body which may lead to fatality. Early detection of diabetes is a task of significant importance to start timely treatment. This study introduces a methodology for the classification of diabetic and normal people using an ensemble machine learning model and feature fusion of Chi-square and principal component analysis. An ensemble model, logistic tree classifier (LTC), is proposed which incorporates logistic regression and extra tree classifier through a soft voting mechanism. Experiments are also performed using several well-known machine learning algorithms to analyze their performance including logistic regression, extra tree classifier, AdaBoost, Gaussian naive Bayes, decision tree, random forest, and k nearest neighbor. In addition, several experiments are carried out using principal component analysis (PCA) and Chi-square (Chi-2) fea-tures to analyze the influence of feature selection on the performance of machine learning classifiers. Results indicate that Chi-2 features show high performance than both PCA features and original features. However, the highest accuracy is obtained when the proposed ensemble model LTC is used with the proposed fea-ture fusion framework-work which achieves a 0.85 accuracy score which is the highest of the available approaches for diabetes prediction. In addition, the statis-tical T-test proves the statistical significance of the proposed approach over other approaches.
引用
收藏
页码:1931 / 1949
页数:19
相关论文
共 43 条
  • [1] Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine
    Ahmed, Zeeshan
    Mohamed, Khalid
    Zeeshan, Saman
    Dong, Xinqi
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2020,
  • [2] Al Jarullah Asma A., 2011, 2011 International Conference on Innovations in Information Technology (IIT), P303, DOI 10.1109/INNOVATIONS.2011.5893838
  • [3] [Anonymous], 2014, Int J Adv Res Comput Sci Soft Eng
  • [4] [Anonymous], 2020, TOP 10 CAUS DEATH
  • [5] [Anonymous], 2020, UCI REPOSITORY MACHI
  • [6] Development and validation of a predictive model for incident type 2 diabetes in middle-aged Mexican adults: the metabolic syndrome cohort
    Arellano-Campos, Olimpia
    Gomez-Velasco, Donaji V.
    Bello-Chavolla, Omar Yaxmehen
    Cruz-Bautista, Ivette
    Melgarejo-Hernandez, Marco A.
    Munoz-Hernandez, Liliana
    Guillen, Luz E.
    de Jesus Garduno-Garcia, Jose
    Alvirde, Ulices
    Ono-Yoshikawa, Yukiko
    Choza-Romero, Ricardo
    Sauque-Reyna, Leobardo
    Eugenia Garay-Sevilla, Maria
    Manuel Malacara-Hernandez, Juan
    Teresa Tusie-Luna, Maria
    Miguel Gutierrez-Robledo, Luis
    Gomez-Perez, Francisco J.
    Rojas, Rosalba
    Aguilar-Salinas, Carlos A.
    [J]. BMC ENDOCRINE DISORDERS, 2019, 19 (1)
  • [7] Aslam S., 2014, INT J ADV RES COMPUT, V2, P50
  • [8] Balaji H., 2017, Int. J. Database Theory Appl., V10, P47, DOI DOI 10.14257/IJDTA.2017.10.9.05
  • [9] Prediction and diagnosis of future diabetes risk: a machine learning approach
    Birjais, Roshan
    Mourya, Ashish Kumar
    Chauhan, Ritu
    Kaur, Harleen
    [J]. SN APPLIED SCIENCES, 2019, 1 (09):
  • [10] Comparison of different methods for determining diabetes
    Bozkurt, Mehmet Recep
    Yurtay, Nilufer
    Yilmaz, Ziynet
    Sertkaya, Cengiz
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2014, 22 (04) : 1044 - 1055