Boosting for learning multiple classes with imbalanced class distribution

被引:197
作者
Sun, Yanmin [1 ]
Kamel, Mohamed S.
Wang, Yang
机构
[1] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
[2] Software Syst Ltd, Pattern Discovery, Waterloo, ON, Canada
来源
ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2006年
关键词
D O I
10.1109/icdm.2006.29
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Classification of data with imbalanced class distribution has posed a significant drawback of the performance attainable by most standard classifier learning algorithms, which assume a relatively balanced class distribution and equal misclassification costs. This learning difficulty attracts a lot of research interests. Most efforts concentrate on bi-class problems. However bi-class is not the only scenario where the class imbalance problem prevails. Reported solutions for bi-class applications are not applicable to multi-class problems. In this paper we develop a cost-sensitive boosting algorithm to improve the classification performance of imbalanced data involving multiple classes. One barrier of applying the cost-sensitive boosting algorithm to the imbalanced data is that the cost matrix is often unavailable for a problem domain. To solve this problem, we apply Genetic Algorithm to search the optimum cost setup of each class. Empirical tests show that the proposed cost-sensitive boosting algorithm improves the classification performances of imbalanced data sets significantly.
引用
收藏
页码:592 / 602
页数:11
相关论文
共 24 条
[1]  
[Anonymous], P SPEC INT GROUP KNO
[2]  
[Anonymous], 1998, PROC 17 ANN INT ACM
[3]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[4]  
Elkan C, 2001, IJCAI, DOI DOI 10.5555/1642194.1642224
[5]  
Fan W, 1999, MACHINE LEARNING, PROCEEDINGS, P97
[6]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[7]   Additive logistic regression: A statistical view of boosting - Rejoinder [J].
Friedman, J ;
Hastie, T ;
Tibshirani, R .
ANNALS OF STATISTICS, 2000, 28 (02) :400-407
[8]  
Holland JH, 1992, ADAPTATION NATURAL A, DOI DOI 10.7551/MITPRESS/1090.001.0001
[9]  
Japkowicz N., 2002, Intelligent Data Analysis, V6, P429
[10]  
JAPKOWICZ N, 2001, MACHINE LEARNING, V41