Semi-supervised support vector machines for unlabeled data classification

被引:103
作者
Fung, G [1 ]
Mangasarian, OL [1 ]
机构
[1] Univ Wisconsin, Dept Comp Sci, Madison, WI 53706 USA
关键词
unlabeled data; classification; support vector machines;
D O I
10.1080/10556780108805809
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A concave minimization approach is proposed for classifying unlabeled data based on the following ideas: (i) A small representative percentage (5% to 10%) of the unlabeled data is chosen by a clustering algorithm and given to an expert or oracle to label. (ii) A linear support vector machine is trained using the small labeled sample while simultaneously assigning the remaining bulk of the unlabeled dataset to one of two classes so as to maximize the margin (distance) between the two bounding planes that determine the separating plane midway between them. This latter problem is formulated as a concave minimization problem on a polyhedral set for which a stationary point is quickly obtained by solving a few (5 to 7) linear programs. Such stationary points turn out to be very effective as evidenced by our computational results which show that clustered concave minimization yields: (a) Test set improvement as high as 20.4% over a linear support vector machine trained on a correspondingly small but randomly chosen subset that is labeled by an expert. (b) Test set correctness averaged to within 5.1% when compared to that of a completely supervised linear support vector machine trained on the entire dataset which has been labeled by an expert.
引用
收藏
页码:29 / 44
页数:16
相关论文
共 14 条
  • [1] [Anonymous], 1997, ACTA MATH VIETNAM
  • [2] Bennett KP, 1999, ADV NEUR IN, V11, P368
  • [3] Bradley P. S., 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P82
  • [4] Bradley PS, 1997, ADV NEUR IN, V9, P368
  • [5] Brooke A., 1998, GAMS USERS GUIDE
  • [6] Cherkassky V.S., 1998, LEARNING DATA CONCEP, V1st ed.
  • [7] *CPLEX OPT INC, 1992, US CPLEX TM LIN OPT
  • [8] Mangasarian O., 1996, Applied Mathematics and Parallel computing - Festschrift for Klaus Ritter, P175
  • [9] *MATLAB, 1992, US GUID
  • [10] Murphy P. M, 1992, UCI REPOSITORY MACHI