An Improved Sample Selection Algorithm in Fuzzy Decision Tree Induction

被引:5
作者
Dong, Ling-Cai [1 ]
Wang, Dan [1 ]
Wang, Xi-Zhao [1 ]
机构
[1] Hebei Univ, Coll Math & Comp Sci, Baoding, Peoples R China
来源
2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9 | 2009年
关键词
Sample selection; probability distribution; similarity; classification ambiguity; fuzzy decision tree;
D O I
10.1109/ICSMC.2009.5346654
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper improves a method of sample selection based on maximum entropy. Compared with the original method, the improved one takes the probability distribution of unlabeled instances into consideration. It selects the instances which can reduce the uncertainty of the whole unlabeled set to a great extent. The uncertainty reduction of the whole unlabeled set caused by an instance is measured by the instance's uncertainty and its influence index on the whole unlabeled set. To calculate the influence index conveniently, we introduces the similar matrix, the elements of which are the similarities measured by the distances between instances. The new method avoids the drawbacks that some abnormal or isolated samples may be selected by original method. Thus it can select the instances with more representation and more capability to resist noises. Our experimental results show that the performance of the classifier built from samples selected by the new algorithm is better than those selected by original method in the same time complexity.
引用
收藏
页码:629 / 634
页数:6
相关论文
共 17 条
[1]  
Abe N., 1998, P 15 INT C MACH LEAR, P1
[2]  
[Anonymous], 1994, SIGIR
[3]  
[Anonymous], 2000, NIPS
[4]  
Campbell I.C.G., 2000, ICML, V1, P111
[5]   IMPROVING GENERALIZATION WITH ACTIVE LEARNING [J].
COHN, D ;
ATLAS, L ;
LADNER, R .
MACHINE LEARNING, 1994, 15 (02) :201-221
[6]  
HONG Y, 2008, ACM SIGEVO GEN EV CO, P471
[7]   Learning Assignment Order of Instances for the Constrained K-Means Clustering Algorithm [J].
Hong, Yi ;
Kwong, Sam .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2009, 39 (02) :568-574
[8]  
Jun Long, 2008, J COMPUTER RES DEV S, p[45, 300]
[9]   Selective sampling for nearest neighbor classifiers [J].
Lindenbaum, M ;
Markovitch, S ;
Rusakov, D .
MACHINE LEARNING, 2004, 54 (02) :125-152
[10]  
MELVILLE P, 2004, P 15 INT C MACH LEAR