Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms

被引:189
作者
Ozcift, Akin [1 ]
Gulten, Arif [2 ]
机构
[1] Gaziantep Univ, Gaziantep Vocat Sch Higher Educ, Comp Programming Div, Gaziantep, Turkey
[2] Firat Univ, Fac Engn, Elect Elect Dept, TR-23169 Elazig, Turkey
关键词
Rotation forest; Ensemble learning; Classifier performance; Parkinson's; Diabetes; Cleveland heart; Computer aided diagnosis; SELECTION; ACCURACY; SYSTEM;
D O I
10.1016/j.cmpb.2011.03.018
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature. While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC). Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases. RE, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems. (C) 2011 Elsevier Ireland Ltd. All rights reserved.
引用
收藏
页码:443 / 451
页数:9
相关论文
共 30 条
[1]   GMDH-based feature ranking and selection for improved classification of medical data [J].
Abdel-Aal, RE .
JOURNAL OF BIOMEDICAL INFORMATICS, 2005, 38 (06) :456-468
[2]   Comparison of classification accuracy using Cohen's Weighted Kappa [J].
Ben-David, Arie .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (02) :825-832
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]  
CHENGSAN Y, 2008, IEEE C SOFT COMP IND, P159
[5]   Cost-conscious classifier ensembles [J].
Demir, C ;
Alpaydin, E .
PATTERN RECOGNITION LETTERS, 2005, 26 (14) :2206-2214
[6]  
DUANGSOITHONG R, 2009, P 6 INT C MACH LEARN, P206
[7]   AptaCDSS-E: A classifier ensemble-based clinical decision support system for cardiovascular disease level prediction [J].
Eom, Jae-Hong ;
Kim, Sung-Chun ;
Zhang, Byoung-Tak .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (04) :2465-2479
[8]   Research on collaborative negotiation for e-commerce. [J].
Feng, YQ ;
Lei, Y ;
Li, Y ;
Cao, RZ .
2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, :2085-2088
[9]  
Guyon I., 2003, J MACH LEARN RES, V3, P1157
[10]  
Hall M, 1999, THESIS U WAIKATO, P51