A Multicriteria Weighted Vote-Based Classifier Ensemble for Heart Disease Prediction

被引:23
作者
Bashir, Saba [1 ]
Qamar, Usman [1 ]
Khan, Farhan Hassan [1 ]
机构
[1] Natl Univ Sci & Technol, Dept Comp Engn, Coll Elect & Mech Engn, Islamabad, Pakistan
关键词
ensemble; weighted vote; naive Bayes; decision tree; Gini index; information gainl; instance-based learner; support vector machine; ANOVA; cross-validation; DATA MINING TECHNIQUES;
D O I
10.1111/coin.12070
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The availability of a large amount of medical data leads to the need of intelligent disease prediction and analysis tools to extract hidden information. A large number of data mining and statistical analysis tools are used for disease prediction. Single data-mining techniques show acceptable level of accuracy for heart disease diagnosis. This article focuses on prediction and analysis of heart disease using weighted vote-based classifier ensemble technique. The proposed ensemble model overcomes the limitations of conventional data-mining techniques by employing the ensemble of five heterogeneous classifiers: naive Bayes, decision tree based on Gini index, decision tree based on information gain, instance-based learner, and support vector machines. We have used five benchmark heart disease data sets taken from UCI repository. Each data set contains different set of feature space that ultimately leads to the prediction of heart disease. The effectiveness of proposed ensemble classifier is investigated by comparing the performance with different researchers' techniques. Tenfold cross-validation is used to handle the class imbalance problem. Moreover, confusion matrices and analysis of variance statistics are used to show the prediction results of all classifiers. The experimental results verify that the proposed ensemble classifier can deal with all types of attributes and it has achieved the high diagnosis accuracy of 87.37%, sensitivity of 93.75%, specificity of 92.86%, and F-measure of 82.17%. The F-ratio higher than the F-critical and p-value less than 0.01 for a 95% confidence interval indicate that the results are statistically significant for all the data sets.
引用
收藏
页码:615 / 645
页数:31
相关论文
共 55 条
[1]  
Abuhaiba ISI, 2006, ARAB J SCI ENG, V31, P223
[2]  
Anbarasi M., 2010, INT J ENG SCI TECHNO, V2, P5370
[3]  
[Anonymous], F TEST
[4]  
[Anonymous], HEART DIS MALE
[5]  
[Anonymous], P DAT MIN ALG WORKSH
[6]  
[Anonymous], 2011, P INT C COMP SCI INF
[7]  
[Anonymous], P 15 AM C INF SYST A
[8]  
[Anonymous], ANAL OF VARIANCE
[9]  
[Anonymous], 2012, ICECIT 2012
[10]  
[Anonymous], 1997, ICML