Cost-sensitive decision tree ensembles for effective imbalanced classification

被引:265
作者
Krawczyk, Bartosz [1 ]
Wozniak, Michal [1 ]
Schaefer, Gerald [2 ]
机构
[1] Wroclaw Univ Technol, Dept Syst & Comp Networks, PL-50370 Wroclaw, Poland
[2] Univ Loughborough, Dept Comp Sci, Loughborough, Leics, England
关键词
Machine learning; Multiple classifier system; Ensemble classifier; Imbalanced classification; Cost-sensitive classification; Decision tree; Classifier selection; Evolutionary algorithms; Classifier fusion; COMBINING CLASSIFIERS; PATTERN-RECOGNITION; MINORITY CLASS; ALGORITHMS;
D O I
10.1016/j.asoc.2013.08.014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on oversampling, undersampling or cost-sensitive classification. In this paper, we introduce an effective ensemble of cost-sensitive decision trees for imbalanced classification. Base classifiers are constructed according to a given cost matrix, but are trained on random feature subspaces to ensure sufficient diversity of the ensemble members. We employ an evolutionary algorithm for simultaneous classifier selection and assignment of committee member weights for the fusion process. Our proposed algorithm is evaluated on a variety of benchmark datasets, and is confirmed to lead to improved recognition of the minority class, to be capable of outperforming other state-of-the-art algorithms, and hence to represent a useful and effective approach for dealing with imbalanced datasets. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:554 / 562
页数:9
相关论文
共 48 条
[1]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[2]   Combined 5 x 2 cv F test for comparing supervised classification learning algorithms [J].
Alpaydin, E .
NEURAL COMPUTATION, 1999, 11 (08) :1885-1892
[3]  
[Anonymous], P 21 INT C MACH LEAR
[4]  
[Anonymous], 2004, COMBINING PATTERN CL, DOI DOI 10.1002/0471660264
[5]  
[Anonymous], EVOLUTIONARY COMPUTA
[6]  
[Anonymous], 1984, OLSHEN STONE CLASSIF, DOI 10.2307/2530946
[7]  
[Anonymous], 2000, Pattern Classification
[8]  
Biggio B, 2007, LECT NOTES COMPUT SC, V4472, P292
[9]  
Blaszczynski J, 2010, LECT NOTES ARTIF INT, V6086, P148, DOI 10.1007/978-3-642-13529-3_17
[10]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)