Leveraging ensemble pruning for imbalanced data classification

被引:4
作者
Krawczyk, Bartosz [1 ]
Wozniak, Michal [2 ]
机构
[1] Virginia Commonwealth Univ, Dept Comp Sci, Med Coll Virginia Campus, Richmond, VA 23284 USA
[2] Wroclaw Univ Sci & Technol, Dept Syst & Comp Networks, Wroclaw, Poland
来源
2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC) | 2018年
关键词
machine learning; imbalanced data; ensemble learning; ensemble pruning; CLASSIFIERS; PERFORMANCE; DIVERSITY;
D O I
10.1109/SMC.2018.00084
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The effectiveness of machine learning algorithms depends on the quality of the supplied training data. Any problems embedded in the nature of data will result in obtaining incorrect classification models, especially imbalanced data distribution is among the most significant learning difficulties that can affect classifiers. As one of the classes has much more instances than the other, the learning process becomes biased towards it. Therefore, methods for alleviating the impact of skewed distributions are highly sought after. Ensemble learning has emerged as one of the leading paradigms for imbalanced data. Creation of an efficient pool of classifiers is not a trivial task and one needs to carefully select which classifiers should be combined to obtain the best predictive power. In this paper, we propose a compound ensemble pruning algorithm for imbalanced data. It aims to retain classifiers that offer the best performance on both minority and majority classes, and display a high level of diversity. Remaining learners are discarded from the pool. This is achieved by the means of a multi-criteria evolutionary algorithm. Extensive experimental study show that our proposal is able to create smaller ensembles than the state-of-the-art methods, while offering an improved robustness to imbalanced class distributions.
引用
收藏
页码:439 / 444
页数:6
相关论文
共 23 条
[1]   Combined 5 x 2 cv F test for comparing supervised classification learning algorithms [J].
Alpaydin, E .
NEURAL COMPUTATION, 1999, 11 (08) :1885-1892
[2]  
Blaszczynski J, 2018, STUD COMPUT INTELL, V738, P51, DOI 10.1007/978-3-319-67946-4_2
[3]   A Survey of Predictive Modeling on Im balanced Domains [J].
Branco, Paula ;
Torgo, Luis ;
Ribeiro, Rita P. .
ACM COMPUTING SURVEYS, 2016, 49 (02)
[4]   Extremely high-dimensional optimization with MapReduce: Scaling functions and algorithm [J].
Cano, Alberto ;
Garcia-Martinez, Carlos ;
Ventura, Sebastian .
INFORMATION SCIENCES, 2017, 415 :110-127
[5]   Weighted Data Gravitation Classification for Standard and Imbalanced Data [J].
Cano, Alberto ;
Zafra, Amelia ;
Ventura, Sebastian .
IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (06) :1672-1687
[6]   Hellinger distance decision trees are robust and skew-insensitive [J].
Cieslak, David A. ;
Hoens, T. Ryan ;
Chawla, Nitesh V. ;
Kegelmeyer, W. Philip .
DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 24 (01) :136-158
[7]   Diversity techniques improve the performance of the best imbalance learning ensembles [J].
Diez-Pastor, Jose F. ;
Rodriguez, Juan J. ;
Garcia-Osorio, Cesar I. ;
Kuncheva, Ludmila I. .
INFORMATION SCIENCES, 2015, 325 :98-117
[8]   A dynamic overproduce-and-choose strategy for the selection of classifier ensembles [J].
Dos Santos, Eulanda M. ;
Sabourin, Robert ;
Maupin, Patrick .
PATTERN RECOGNITION, 2008, 41 (10) :2993-3009
[9]  
Duda R.O., 2006, Pattern Classification
[10]   Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets [J].
Galar, Mikel ;
Fernandez, Alberto ;
Barrenechea, Edurne ;
Bustince, Humberto ;
Herrera, Francisco .
INFORMATION SCIENCES, 2016, 354 :178-196