Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data

被引:161
作者
Bhowan, Urvesh [1 ]
Johnston, Mark [1 ]
Zhang, Mengjie [1 ]
Yao, Xin [2 ]
机构
[1] Victoria Univ Wellington, Wellington 6140, New Zealand
[2] Univ Birmingham, Sch Comp Sci, Ctr Excellence Res Computat Intelligence & Applic, Birmingham B15 2TT, W Midlands, England
关键词
Classification; class imbalance learning; genetic programming (GP); multiobjective machine learning (ML); STRATEGIES;
D O I
10.1109/TEVC.2012.2199119
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In classification, machine learning algorithms can suffer a performance bias when data sets are unbalanced. Data sets are unbalanced when at least one class is represented by only a small number of training examples (called the minority class), while the other class(es) make up the majority. In this scenario, classifiers can have good accuracy on the majority class, but very poor accuracy on the minority class(es). This paper proposes a multiobjective genetic programming (MOGP) approach to evolving accurate and diverse ensembles of genetic program classifiers with good performance on both the minority and majority of classes. The evolved ensembles comprise of nondominated solutions in the population where individual members vote on class membership. This paper evaluates the effectiveness of two popular Pareto-based fitness strategies in the MOGP algorithm (SPEA2 and NSGAII), and investigates techniques to encourage diversity between solutions in the evolved ensembles. Experimental results on six (binary) class imbalance problems show that the evolved ensembles outperform their individual members, as well as single-predictor methods such as canonical GP, naive Bayes, and support vector machines, on highly unbalanced tasks. This highlights the importance of developing an effective fitness evaluation strategy in the underlying MOGP algorithm to evolve good ensemble members.
引用
收藏
页码:368 / 386
页数:19
相关论文
共 52 条
[1]  
Abbass H, 2006, STUD COMP INTELL, V16, P407
[2]  
Abbass H., 2001, Australasia-Japan Workshop on Intelligent and Evolutionary Systems, P45
[3]  
Abbass HA, 2003, LECT NOTES ARTIF INT, V2903, P554
[4]  
Abbass HA, 2003, IEEE C EVOL COMPUTAT, P2074
[5]  
Alfaro-Cid E, 2007, LECT NOTES COMPUT SC, V4448, P169
[6]  
[Anonymous], 2007, Uci machine learning repository
[7]  
[Anonymous], 1997, P 14 INT C ONMACHINE
[8]  
[Anonymous], 2003, Genetic programming IV: routine human-competitive machine intelligence
[9]   Strategies for learning in class imbalance problems [J].
Barandela, R ;
Sánchez, JS ;
García, V ;
Rangel, E .
PATTERN RECOGNITION, 2003, 36 (03) :849-851
[10]  
Batista GEAPA, 2005, LECT NOTES COMPUT SC, V3646, P24