Imbalance learning using heterogeneous ensembles

被引：39

作者：

Zefrehi, Hossein Ghaderi ^{[1
]}

Altincay, Hakan ^{[1
]}

机构：

[1] Eastern Mediterranean Univ Famagusta, Dept Comp Engn, Via Mersin 10 Turkey, Famagusta, North Cyprus, Turkey

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2020年 / 142卷

关键词：

Imbalance learning; Classifier ensembles; Bagging; Boosting; Heterogeneous ensembles; Multiple balancing methods; CLASSIFICATION; CLASSIFIERS; DIVERSITY; SELECTION; ACCURACY;

D O I：

10.1016/j.eswa.2019.113005

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In binary classification, class-imbalance problem occurs when the number of samples in one class is much larger than that of the other class. In such cases, the performance of a classifier is generally poor on the minority class. Classifier ensembles are used to tackle this problem where each member is trained using a different balanced dataset that is computed by randomly undersampling the majority class and/or randomly oversampling the minority. Although the primary target of imbalance learning is the minority class, downsampling-based schemes employ the same minority sample set for all members whereas oversampling the minority is challenging due to its unclear structure. On the other hand, heterogeneous ensembles utilizing multiple learning algorithms have a higher potential in generating diverse members than homogeneous ones. In this study, the use of heterogeneous ensembles for imbalance learning is addressed. Experiments are conducted on 66 datasets to explore the relation between the heterogeneity of the ensemble and performance scores using AUC and F-1 measures. The results obtained have shown that the performance scores improve as the number of classification methods is increased from one to five. Moreover, when compared with homogeneous ensembles, significantly higher scores are achieved using heterogeneous ones. Also, it is observed that multiple balancing schemes contribute to the performance scores of some homogeneous and heterogeneous ensembles. However, the improvements are not significant for either approach. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页数：15

共 52 条

[1]

Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255

[2]

[Anonymous], 2010, P 2010 SIAM INT C DA

[3] DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique [J].