Nearest neighbor editing aided by unlabeled data

被引:61
作者
Guan, Donghai [1 ]
Yuan, Weiwei [1 ]
Lee, Young-Koo [1 ]
Lee, Sungyoung [1 ]
机构
[1] Kyung Hee Univ, Dept Comp Engn, Yongin 446701, South Korea
关键词
Nearest neighbor editing; Unlabeled data; Edited nearest neighbor; Repeated edited nearest neighbor; All k-NN; FRAMEWORK;
D O I
10.1016/j.ins.2009.02.011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a novel method for nearest neighbor editing. Nearest neighbor editing aims to increase the classifier's generalization ability by removing noisy instances from the training set. Traditionally nearest neighbor editing edits (removes/retains) each instance by the voting of the instances in the training set (labeled instances). However, motivated by semi-supervised learning, we propose a novel editing methodology which edits each training instance by the voting of all the available instances (both labeled and unlabeled instances). We expect that the editing performance could be boosted by appropriately using unlabeled data. Our idea relies on the fact that in many applications, in addition to the training instances, many unlabeled instances are also available since they do not need human annotation effort. Three popular data editing methods, including edited nearest neighbor, repeated edited nearest neighbor and All k-NN are adopted to verify our idea. They are tested on a set of LICI data sets. Experimental results indicate that all the three editing methods can achieve improved performance with the aid of unlabeled data. Moreover, the improvement is more remarkable when the ratio of training data to unlabeled data is small. (C) 2009 Elsevier Inc. All rights reserved.
引用
收藏
页码:2273 / 2282
页数:10
相关论文
共 23 条
[1]  
Angluin D., 1988, Machine Learning, V2, P343, DOI 10.1023/A:1022873112823
[2]  
[Anonymous], 2003, Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003-Volume 4, CONLL'03
[3]  
Bennett K.P., 2002, P 8 ACM SIGKDD INT C, P289
[4]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[5]   Active learning for image retrieval with Co-SVM [J].
Cheng, Jian ;
Wang, Kongqiao .
PATTERN RECOGNITION, 2007, 40 (01) :330-334
[6]   Semi-supervised and active learning with the probabilistic RBF classifier [J].
Constantinopoulos, Constantinos ;
Likas, Aristidis .
NEUROCOMPUTING, 2008, 71 (13-15) :2489-2498
[7]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[8]   A connectionist model for selection of cases [J].
De, RK ;
Pal, SK .
INFORMATION SCIENCES, 2001, 132 (1-4) :179-194
[9]  
DEVIJVER PA, 1982, PATTERN RECOGNITION
[10]  
FERRI FJ, 1992, 11TH IAPR INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, PROCEEDINGS, VOL II, P607, DOI 10.1109/ICPR.1992.201851