Improving coronary heart disease prediction with real-life dataset: a stacked generalization framework with maximum clinical attributes and SMOTE balancing for imbalanced data

被引:0
作者
Dubey M. [1 ]
Tembhurne J. [1 ]
Makhijani R. [1 ]
机构
[1] Department of Computer Science and Engineering, Indian Institute of Information Technology, Maharashtra, Nagpur
关键词
Coronary heart disease; Framingham Heart Study; Outlier detection; SMOTE; Stacked generalization;
D O I
10.1007/s11042-024-19429-9
中图分类号
学科分类号
摘要
Heart disease increases the strain on the heart by reducing its ability to pump blood throughout the body, which can lead to heart attacks and strokes. Heart disease is becoming a global threat to the world due to people’s unhealthy lifestyles, prevalent stroke history, physical inactivity, and current medical background. In predictive analytics, many studies were proposed to get alerts about forthcoming heart disease based on various attributes. However, the performance metrics were good, but the model was trained with few features. This study aims to train the model with all the essential attributes for heart disease prediction on Framingham Heart Study (FHS) dataset. The dataset is pre-processed with IQR (Inter Quartile Range) outlier detection followed by data oversampling using Synthetic Minority Oversampling Technique (SMOTE). We proposed a stack generalization approach, wherein various machine learning classifiers, namely logistic regression, random forest, K-nearest neighbour, Naïve Bayes, support vector machine, XGBoost, and decision tree with optimized hyperparameter trained the model to offer the best learner for the prediction of Coronary Heart disease with improved performance. The proposed model is tested on the original imbalanced and SMOTE-balanced FHS dataset. It is observed that logistic regression on the original (imbalanced) FHS dataset provides 86.51% accuracy, while the support vector machine on the SMOTE (balanced) FHS dataset outperformed the other models with an accuracy of 93.07%. Also, the proposed approach of stacked generalization with cross-validation provided 97.2% accuracy on SMOTE (balanced) dataset, which is remarkable. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:85139 / 85168
页数:29
相关论文
共 37 条
[1]  
WHO. Retrieved January 21, 2022, from, (2021)
[2]  
Chen M., Hao Y., Hwang K., Wang L., Wang L., Disease Prediction by Machine Learning Over Big Data From Healthcare Communities, IEEE Access, 5, pp. 8869-8879, (2017)
[3]  
Sreeniwas Kumar A., Sinha N., Cardiovascular disease in India: a 360-degree overview, Med J Armed Forces India, 76, 1, pp. 1-3, (2020)
[4]  
Garate-Escamila A.K., Hassani E., Andres E., Classification models for heart disease prediction using feature selection and PCA, Inf Med Unlocked, 19, (2020)
[5]  
Sutton R.T., Pincock D., Baumgart D.C., Sadowski D.C., Fedorak R.N., Kroeker K.I., An overview of clinical decision support systems: benefits, risks, and strategies for success, NPJ Digit Med, 3, 1, (2020)
[6]  
Janosi A., UCI machine learning repository, Retrieved January 11, 2022, From, (1988)
[7]  
Dua D., Graff C., UCI machine learning repository, statlog Heart disease dataset. Retrieved January 11, 2022, From, (2017)
[8]  
Siddhartha M., Heart Disease Dataset (Comprehensive). IEEE Dataport. Retrieved from January 21,2022, (2020)
[9]  
Framingham Heart Study publicly available dataset on kaggle, FHS. Framingham Heart Study Publicly Available Dataset on Kaggle, (2021)
[10]  
Chicco D., Jurman G., Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone, BMC Med Inf Decis Mak, 20, pp. 1-16, (2020)