Binarization With Boosting and Oversampling for Multiclass Classification

被引:34
作者
Sen, Ayon [1 ]
Islam, Md. Monirul [2 ]
Murase, Kazuyuki [3 ]
Yao, Xin [4 ]
机构
[1] Univ Wisconsin, Dept Comp Sci, 1210 W Dayton St, Madison, WI 53706 USA
[2] Bangladesh Univ Engn & Technol, Dept Comp Sci & Engn, Dhaka 1000, Bangladesh
[3] Univ Fukui, Dept Human & Artificial Intelligence Syst, Fukui 9108507, Japan
[4] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
关键词
Binarization; boosting; multiclass classification; oversampling; STRATEGY;
D O I
10.1109/TCYB.2015.2423295
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Using a set of binary classifiers to solve multiclass classification problems has been a popular approach over the years. The decision boundaries learnt by binary classifiers (also called base classifiers) are much simpler than those learnt by multiclass classifiers. This paper proposes a new classification framework, termed binarization with boosting and oversampling (BBO), for efficiently solving multiclass classification problems. The new framework is devised based on the one-versus-all (OVA) binarization technique. Unlike most previous work, BBO employs boosting for solving the hard-to-learn instances and oversampling for handling the class-imbalance problem arising due to OVA binarization. These two features make BBO different from other existing works. Our new framework has been tested extensively on several multiclass supervised and semi-supervised classification problems using five different base classifiers, including neural networks, C4.5, k-nearest neighbor, repeated incremental pruning to produce error reduction, support vector machine, random forest, and learning with local and global consistency. Experimental results show that BBO can exhibit better performance compared to its counterparts on supervised and semi-supervised classification problems.
引用
收藏
页码:1078 / 1091
页数:14
相关论文
共 48 条
[1]   AN IMPROVED ALGORITHM FOR NEURAL-NETWORK CLASSIFICATION OF IMBALANCED TRAINING SETS [J].
ANAND, R ;
MEHROTRA, KG ;
MOHAN, CK ;
RANKA, S .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1993, 4 (06) :962-969
[2]  
[Anonymous], 2004, DISCRIMINANT ANAL ST
[3]  
[Anonymous], 2006, Computer Science, University of Wisconsin-Madison
[4]  
[Anonymous], 2003, P 20 INT C MACH LEAR, DOI DOI 10.1145/2612669.2612699
[5]  
Asuncion A., 2007, Uci machine learning repository
[6]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[7]   Learning to classify text using support vector machines: Methods, theory, and algorithms [J].
Basili, R .
COMPUTATIONAL LINGUISTICS, 2003, 29 (04) :655-661
[8]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]   Bayesian semi-supervised learning with support vector machine [J].
Chakraborty, Sounak .
STATISTICAL METHODOLOGY, 2011, 8 (01) :68-82