One-class ensemble classifier for data imbalance problems

被引:21
作者
Hayashi, Toshitaka [1 ]
Fujita, Hamido [2 ,3 ]
机构
[1] Iwate Prefectural Univ, Fac Software & Informat Sci, Takizawa, Japan
[2] I Somet Org Inc Assoc, Morioka, Iwate, Japan
[3] Iwate Prefectural Univ, Reg Res Ctr, Takizawa, Japan
关键词
Imbalanced data classification; One-class classification; Ensemble learning; One-class ensemble; SMOTE; COMPLEXITY; SELECTION; SUPPORT;
D O I
10.1007/s10489-021-02671-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced data classification is an important issue in machine learning. Despite various studies, solving the data imbalance problem is still difficult. Since the oversampling method uses fake minority data, such a method is untrusted and causing security instability. The main objective of this paper is to improve accuracy for data imbalance classification without generating fake minority data. For this purpose, a reliable strategy is proposed using an ensemble of one-class classifiers. Such a classifier does not suffer data imbalance problems since the model learns from a single class. In particular, training data is split into minority and majority sets. Then, one-class classifiers are trained separately and applied to compute minority and majority scores for testing data. Finally, classification is made based on the combination of both scores. The proposed method is experimented with using imbalanced-learn datasets. Moreover, the result is compared with sampling methods via Decision Tree and K Nearest Neighbors classifiers. One-class ensemble classifier outperforms sampling methods in 20 datasets.
引用
收藏
页码:17073 / 17089
页数:17
相关论文
共 41 条
[1]   One-class support vector classifiers: A survey [J].
Alam, Shamshe ;
Sonbhadra, Sanjay Kumar ;
Agarwal, Sonali ;
Nagabhushan, P. .
KNOWLEDGE-BASED SYSTEMS, 2020, 196
[2]   MFC-GAN: Class-imbalanced dataset classification using Multiple Fake Class Generative Adversarial Network [J].
Ali-Gombe, Adamu ;
Elyan, Eyad .
NEUROCOMPUTING, 2019, 361 :212-221
[3]   Assessing the data complexity of imbalanced datasets [J].
Barella, Victor H. ;
Garcia, Luis P. F. ;
de Souto, Marcilio C. P. ;
Lorena, Ana C. ;
de Carvalho, Andre C. P. L. F. .
INFORMATION SCIENCES, 2021, 553 :83-109
[4]   An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme [J].
Bi, Jingjun ;
Zhang, Chongsheng .
KNOWLEDGE-BASED SYSTEMS, 2018, 158 :81-93
[5]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]   Unsupervised learning of finite mixture models [J].
Figueiredo, MAT ;
Jain, AK .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (03) :381-396
[8]  
Golan I, 2018, 32 C NEURAL INFORM P, V31
[9]   Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning [J].
Han, H ;
Wang, WY ;
Mao, BH .
ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 :878-887
[10]  
Hayashi T, 2020, LECT NOTES ARTIF INT, V12144, P759, DOI 10.1007/978-3-030-55789-8_65