Imbalance learning using heterogeneous ensembles

被引:39
作者
Zefrehi, Hossein Ghaderi [1 ]
Altincay, Hakan [1 ]
机构
[1] Eastern Mediterranean Univ Famagusta, Dept Comp Engn, Via Mersin 10 Turkey, Famagusta, North Cyprus, Turkey
关键词
Imbalance learning; Classifier ensembles; Bagging; Boosting; Heterogeneous ensembles; Multiple balancing methods; CLASSIFICATION; CLASSIFIERS; DIVERSITY; SELECTION; ACCURACY;
D O I
10.1016/j.eswa.2019.113005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In binary classification, class-imbalance problem occurs when the number of samples in one class is much larger than that of the other class. In such cases, the performance of a classifier is generally poor on the minority class. Classifier ensembles are used to tackle this problem where each member is trained using a different balanced dataset that is computed by randomly undersampling the majority class and/or randomly oversampling the minority. Although the primary target of imbalance learning is the minority class, downsampling-based schemes employ the same minority sample set for all members whereas oversampling the minority is challenging due to its unclear structure. On the other hand, heterogeneous ensembles utilizing multiple learning algorithms have a higher potential in generating diverse members than homogeneous ones. In this study, the use of heterogeneous ensembles for imbalance learning is addressed. Experiments are conducted on 66 datasets to explore the relation between the heterogeneity of the ensemble and performance scores using AUC and F-1 measures. The results obtained have shown that the performance scores improve as the number of classification methods is increased from one to five. Moreover, when compared with homogeneous ensembles, significantly higher scores are achieved using heterogeneous ones. Also, it is observed that multiple balancing schemes contribute to the performance scores of some homogeneous and heterogeneous ensembles. However, the improvements are not significant for either approach. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:15
相关论文
共 52 条
[1]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[2]  
[Anonymous], 2010, P 2010 SIAM INT C DA
[3]   DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique [J].
Bunkhumpornpat, Chumphol ;
Sinapiromsaran, Krung ;
Lursinsap, Chidchanok .
APPLIED INTELLIGENCE, 2012, 36 (03) :664-684
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]   SMOTEBoost: Improving prediction of the minority class in boosting [J].
Chawla, NV ;
Lazarevic, A ;
Hall, LO ;
Bowyer, KW .
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 :107-119
[6]  
Cieslak DA, 2008, LECT NOTES ARTIF INT, V5211, P241, DOI 10.1007/978-3-540-87479-9_34
[7]   A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data [J].
Collell, Guillem ;
Prelec, Drazen ;
Patil, Kaustubh R. .
NEUROCOMPUTING, 2018, 275 :330-340
[8]   Dynamic classifier selection: Recent advances and perspectives [J].
Cruz, Rafael M. O. ;
Sabourin, Robert ;
Cavalcanti, George D. C. .
INFORMATION FUSION, 2018, 41 :195-216
[9]   Calibrating Probability with Undersampling for Unbalanced Classification [J].
Dal Pozzolo, Andrea ;
Caelen, Olivier ;
Johnson, Reid A. ;
Bontempi, Gianluca .
2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, :159-166
[10]  
de Souza ÉN, 2011, LECT NOTES ARTIF INT, V6657, P384, DOI 10.1007/978-3-642-21043-3_46