Using representative-based clustering for nearest neighbor dataset editing

被引:16
作者
Eick, CF [1 ]
Zeidat, EN [1 ]
Vilalta, R [1 ]
机构
[1] Univ Houston, Dept Comp Sci, Houston, TX 77004 USA
来源
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2004年
关键词
D O I
10.1109/ICDM.2004.10044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of dataset editing in instance-based learning is to remove objects from a training set in order to increase the accuracy of a classifier. For example, Wilson editing removes training examples that are misclassified by a nearest neighbor classifier so as to smooth the shape of the resulting decision boundaries. This paper revolves around the use of representative-based clustering algorithms for nearest neighbor dataset editing. We term this approach supervised clustering editing. The main idea is to replace a dataset by a set of cluster prototypes. A novel clustering approach called supervised clustering is introduced for this purpose. Our empirical evaluation using eight UCI datasets shows that both Wilson and supervised clustering editing improve accuracy on more than 50% of the datasets tested. However, supervised clustering editing achieves four times higher compression rates than Wilson editing.
引用
收藏
页码:375 / 378
页数:4
相关论文
共 6 条
[1]  
EICK C, UNPUB SUPERVISED CLU
[2]  
PENROD CS, 1977, IEEE T SYST MAN CYB, V7, P92
[3]  
TOUSSAINT G, 2002, P 34 S INTERFACE MON
[4]   ASYMPTOTIC PROPERTIES OF NEAREST NEIGHBOR RULES USING EDITED DATA [J].
WILSON, DL .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1972, SMC2 (03) :408-&
[5]  
ZEIDAT N, 2004, P MLMTA LAS VEG JUN
[6]  
ZHAO Z, 2004, THESIS U HOUSTON