Impact of categorical and numerical features in ensemble machine learning frameworks for heart disease prediction

被引:14
|
作者
Pan, Chandan [1 ]
Poddar, Arnab [2 ]
Mukherjee, Rohan [3 ]
Ray, Ajoy Kumar [1 ,2 ]
机构
[1] JIS Inst Adv Studies & Res, Ctr Data Sci, Sect 5, Kolkata, India
[2] Indian Inst Technol Kharagpur, Dept Elect & Elect Commun Engn, Kharagpur, W Bengal, India
[3] Indian Inst Management Bodh Gaya, Bodh Gaya, India
关键词
Health analytics; Cardiovascular diseases; Heart disease prediction; Machine learning algorithms; Ensemble mechanism; IDENTIFICATION; REGRESSION; DIAGNOSIS; ALGORITHM;
D O I
10.1016/j.bspc.2022.103666
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Cardiovascular disease (CVD) or heart disease is one of the most fatal diseases of the world that has been observed through-out the last decade. The prediction of CVD in majority of cases depends on a set of combination of clinical and pathological data represented by either numerical or categorical variables. The categorical medical data may inherently embed prior medical information during its process of categorisation. Whereas the numerical data are flexible for accurate measurements and reading. Hence it is necessary to asses the impact of categorical and numerical features for CVD prediction. In this work, an exhaustive analysis of numerical, categorical and combination of both types of features have been done in context of state-of-the-art machine learning algorithms. The work has compared the boosting algorithms such as Gradient Boosting, Extreme Gradient Boosting (XGBoost), AdaBoost, CatBoost and additionally artificial neural networks, random forest, support vector machines (SVM), decision tree and logistic regression. A soft voting ensemble mechanism with learning algorithms has also been implemented to predict CVD. The current work has used a publicly available and widely used benchmark dataset: Cleveland heart disease dataset (UCI repository). It uses ten different performance metrics which consistently demonstrate that the categorical features outperforms the numerical and combined features. It is further observed that the ensemble learning of SVM + AdaBoost classifiers with categorical features produces optimum performance of CVD prediction.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Mixed Machine Learning Approach for Efficient Prediction of Human Heart Disease by Identifying the Numerical and Categorical Features
    Ahmad, Ghulab Nabi
    Shafiullah
    Fatima, Hira
    Abbas, Mohamed
    Rahman, Obaidur
    Imdadullah
    Alqahtani, Mohammed S.
    APPLIED SCIENCES-BASEL, 2022, 12 (15):
  • [2] Stacking Ensemble Machine Learning Algorithm with an Application to Heart Disease Prediction
    Fatima, Ruhi
    Kazi, Sabeena
    Tassaddiq, Asifa
    Farhat, Nilofer
    Naaz, Humera
    Jabeen, Sumera
    CONTEMPORARY MATHEMATICS, 2023, 4 (04): : 905 - 925
  • [3] Optimized Conversion of Categorical and Numerical Features in Machine Learning Models
    Sree, K. P. N. V. Satya
    Karthik, Jayavarapu
    Niharika, Ch
    Srinivas, P. V. V. S.
    Ravinder, N.
    Prasad, Chitturi
    PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 294 - 299
  • [4] An Effective Heart Disease Prediction Framework based on Ensemble Techniques in Machine Learning
    Yewale, Deepali
    Vijayaragavan, S. P.
    Bairagi, V. K.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (02) : 182 - 190
  • [5] Harnessing Ensemble in Machine Learning for Accurate Early Prediction and Prevention of Heart Disease
    Husain, Mohammad
    Kumar, Pankaj
    Ahmed, Mohammad Nadeem
    Ali, Arshad
    Rasool, Mohammad Ashiquee
    Hussain, Mohammad Rashid
    Dildar, Muhammad Shahid
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 182 - 195
  • [6] Explainable Heart Disease Prediction Using Ensemble-Quantum Machine Learning Approach
    Abdulsalam, Ghada
    Meshoul, Souham
    Shaiba, Hadil
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (01): : 761 - 779
  • [7] Ensemble Learning Based Rental Apartment Price Prediction Model by Categorical Features Factoring
    Neloy, Asif Ahmed
    Haque, H. M. Sadman
    Ul Islam, Md. Mahmud
    ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 350 - 356
  • [8] Prediction of Heart Disease using an Ensemble Learning Approach
    Alshehri G.A.
    Alharbi H.M.
    International Journal of Advanced Computer Science and Applications, 2023, 14 (08) : 1089 - 1097
  • [9] Prediction of Heart Disease Using Machine Learning
    Begum, M. Asma
    Abirami, S.
    Anandhi, R.
    Dhivyadharshini, K.
    Devi, R. Ganga
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (04): : 39 - 42
  • [10] Improving Heart Disease Diagnosis: An Ensemble Machine Learning Approach
    Namli, Ozge H.
    Yanik, Seda
    INTELLIGENT AND FUZZY SYSTEMS, VOL 3, INFUS 2024, 2024, 1090 : 92 - 100