Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data

被引:8
作者
Li, Fengqi [1 ]
Yu, Chuang [1 ]
Yang, Nanhai [1 ]
Xia, Feng [1 ]
Li, Guangming [1 ]
Kaveh-Yazdy, Fatemeh [1 ]
机构
[1] Dalian Univ Technol, Sch Software, Dalian 116620, Peoples R China
关键词
D O I
10.1155/2013/875450
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Transductive graph-based semisupervised learning methods usually build an undirected graph utilizing both labeled and unlabeled samples as vertices. Those methods propagate label information of labeled samples to neighbors through their edges in order to get the predicted labels of unlabeled samples. Most popular semi-supervised learning approaches are sensitive to initial label distribution which happened in imbalanced labeled datasets. The class boundary will be severely skewed by the majority classes in an imbalanced classification. In this paper, we proposed a simple and effective approach to alleviate the unfavorable influence of imbalance problem by iteratively selecting a few unlabeled samples and adding them into the minority classes to form a balanced labeled dataset for the learning methods afterwards. The experiments on UCI datasets and MNIST handwritten digits dataset showed that the proposed approach outperforms other existing state-of-art methods.
引用
收藏
页数:9
相关论文
共 24 条
[1]  
[Anonymous], PROC CVPR IEEE
[2]  
[Anonymous], 2006, BOOK REV IEEE T NEUR
[3]  
[Anonymous], 2003, P 20 INT C MACH LEAR
[4]  
[Anonymous], IEEE T KNOWLEDGE DAT
[5]  
Belkin M, 2006, J MACH LEARN RES, V7, P2399
[6]  
Chan PhilipK., 1998, KNOWLEDGE DISCOVERY, P164
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]  
du Plessis M.C., 2012, Proceedings of the 29th International Conference on Machine Learning, ICML 2012, P823
[9]  
Ertekin S., 2007, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, CIKM '07, P127
[10]   A multiple resampling method for learning from imbalanced data sets [J].
Estabrooks, A ;
Jo, TH ;
Japkowicz, N .
COMPUTATIONAL INTELLIGENCE, 2004, 20 (01) :18-36