Lazy bagging for classifying imbalanced data

被引:15
作者
Zhu, Xingquan [1 ]
机构
[1] Florida Atlantic Univ, Dept Comp Sci & Engn, Boca Raton, FL 33431 USA
来源
ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING | 2007年
关键词
D O I
10.1109/ICDM.2007.95
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a Lazy Bagging (LB) design, which builds bootstrap replicate bags based on the characteristics of the test instances. Upon receiving a test instance I(k)., LB will trim bootstrap bags by taking I(k)'s nearest neighbors in the training set into consideration. Our hypothesis is that an unlabeled instance's nearest neighbors provide valuable information for learners to refine their local decision boundaries for classifying this instance. By taking full advantage of I(k)'s nearest neighbors, the base learners are able to receive less bias and variance in classifying I(k).. This strategy is beneficial for classifying imbalanced data because refining local decision boundaries can help a learner reduce its inherent bias towards the majority class and improve its performance on minority class examples. Our experimental results will confirm that LB outperforms C4.5 and TB in terms of reducing classification error, and most importantly this error reduction is largely contributed from LB's improvement on minority class examples.
引用
收藏
页码:763 / 768
页数:6
相关论文
共 21 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]  
AHA DW, 1997, AI REV, V11
[3]  
[Anonymous], 1993, C4 5 PROGRAMS MACHIN
[4]  
[Anonymous], MACHINE LEARNING
[5]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139
[6]  
BAY SD, P 15 ICML C
[7]  
BERIMAN L, 1996, 460 UC BERK
[8]  
BLAKE C, 1998, UCI DATA REPOSITORY
[9]  
CHAWLA N, 2002, J AI RES
[10]  
Fern X. Z., 2003, P ICML