Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis

被引:118
作者
Ozcift, Akin [1 ]
机构
[1] Gaziantep Univ, Comp Programming Div, Gaziantep Vocat Sch, Gaziantep, Turkey
关键词
Cardiac arrhythmia; Backward elimination; Filter feature selection; Random forests ensemble classification; Simple random sampling; Correlation based feature selection;
D O I
10.1016/j.compbiomed.2011.03.001
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:265 / 271
页数:7
相关论文
共 28 条
[1]  
ALIFERIS CF, 2003, P 2003 AM MED INF AS, P21
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]  
CUMBERLAND WG, 1988, J ROY STAT SOC B MET, V50, P118
[4]  
DOVILE R, 2006, INFORM TECHNOLOGY CO, P157
[5]  
Gao DY, 2005, IEEE IJCNN, P2383
[6]   A supervised machine learning algorithm for arrhythmia analysis [J].
Guvenir, HA ;
Acar, B ;
Demiroz, G ;
Cekin, A .
COMPUTERS IN CARDIOLOGY 1997, VOL 24, 1997, 24 :433-436
[7]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422
[8]  
Hall M. A., 1999, Proceedings of the Twelfth International Florida AI Research Society Conference, P235
[9]  
Hall MA, 1998, PROGRESS IN CONNECTIONIST-BASED INFORMATION SYSTEMS, VOLS 1 AND 2, P855
[10]   Random Forests Classification Analysis for the Assessment of Diagnostic Skill [J].
Katz, James D. ;
Mamyrova, Gulnara ;
Guzhva, Olena ;
Furmark, Lena .
AMERICAN JOURNAL OF MEDICAL QUALITY, 2010, 25 (02) :149-153