Imbalance learning using heterogeneous ensembles

被引:39
作者
Zefrehi, Hossein Ghaderi [1 ]
Altincay, Hakan [1 ]
机构
[1] Eastern Mediterranean Univ Famagusta, Dept Comp Engn, Via Mersin 10 Turkey, Famagusta, North Cyprus, Turkey
关键词
Imbalance learning; Classifier ensembles; Bagging; Boosting; Heterogeneous ensembles; Multiple balancing methods; CLASSIFICATION; CLASSIFIERS; DIVERSITY; SELECTION; ACCURACY;
D O I
10.1016/j.eswa.2019.113005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In binary classification, class-imbalance problem occurs when the number of samples in one class is much larger than that of the other class. In such cases, the performance of a classifier is generally poor on the minority class. Classifier ensembles are used to tackle this problem where each member is trained using a different balanced dataset that is computed by randomly undersampling the majority class and/or randomly oversampling the minority. Although the primary target of imbalance learning is the minority class, downsampling-based schemes employ the same minority sample set for all members whereas oversampling the minority is challenging due to its unclear structure. On the other hand, heterogeneous ensembles utilizing multiple learning algorithms have a higher potential in generating diverse members than homogeneous ones. In this study, the use of heterogeneous ensembles for imbalance learning is addressed. Experiments are conducted on 66 datasets to explore the relation between the heterogeneity of the ensemble and performance scores using AUC and F-1 measures. The results obtained have shown that the performance scores improve as the number of classification methods is increased from one to five. Moreover, when compared with homogeneous ensembles, significantly higher scores are achieved using heterogeneous ones. Also, it is observed that multiple balancing schemes contribute to the performance scores of some homogeneous and heterogeneous ensembles. However, the improvements are not significant for either approach. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:15
相关论文
共 52 条
[11]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[12]   Approximate statistical tests for comparing supervised classification learning algorithms [J].
Dietterich, TG .
NEURAL COMPUTATION, 1998, 10 (07) :1895-1923
[13]   Diversity techniques improve the performance of the best imbalance learning ensembles [J].
Diez-Pastor, Jose F. ;
Rodriguez, Juan J. ;
Garcia-Osorio, Cesar I. ;
Kuncheva, Ludmila I. .
INFORMATION SCIENCES, 2015, 325 :98-117
[14]   Random Balance: Ensembles of variable priors classifiers for imbalanced data [J].
Diez-Pastor, Jose F. ;
Rodriguez, Juan J. ;
Garcia-Osorio, Cesar ;
Kuncheva, Ludmila I. .
KNOWLEDGE-BASED SYSTEMS, 2015, 85 :96-111
[15]  
Fan W, 1999, MACHINE LEARNING, PROCEEDINGS, P97
[16]   Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets [J].
Fernandez, Alberto ;
del Jesus, Maria Jose ;
Herrera, Francisco .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2009, 50 (03) :561-577
[17]  
Freund Y., 1996, Machine Learning. Proceedings of the Thirteenth International Conference (ICML '96), P148
[18]   A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches [J].
Galar, Mikel ;
Fernandez, Alberto ;
Barrenechea, Edurne ;
Bustince, Humberto ;
Herrera, Francisco .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (04) :463-484
[19]   EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling [J].
Galar, Mikel ;
Fernandez, Alberto ;
Barrenechea, Edurne ;
Herrera, Francisco .
PATTERN RECOGNITION, 2013, 46 (12) :3460-3471
[20]  
Gilpin S. A., 2009, SAND20096940C