Progressive similarity transductive support vector machine algorithm for small sample text classification

被引:0
作者
Ma, Jianbin [1 ]
Li, Ying [2 ]
机构
[1] College of Information Science and Technology, Agricultural University of Hebei, Hebei, 071001, Baoding
[2] College of Economic and Trade, Agricultural University of Hebei, Hebei, 071001, Baoding
关键词
PSTS VM; Small sample; Support Vector Machine; Text classification;
D O I
10.3923/itj.2013.7673.7676
中图分类号
学科分类号
摘要
Support Vector Machine (SVM) algorithm is applied to text classification widely. However, SVM's limitation is that it is difficult to label samples rightly if available training samples are small. So TSVM (Transductive Support Vector Machine) was introduced to minimize misclassification of test samples via., training on labeled and unlabeled samples. However, in the training process of TSVM, the parameter N (the number of positive samples) should be inputted artificially. The parameter N is difficult to estimate. In this study, PSTSVM (Progressive Similarity Transductive Support Vector Machine) was introduced which labeled most likely unlabeled samples pairwise by similarity computing and then retrained to readjust the hyperplane. The experimental results on Reuters dataset showed that PSTSVM algorithm was effective on a mixed training set of unlabeled samples and labeled samples. © 2013 Asian Network for Scientific Information.
引用
收藏
页码:7673 / 7676
页数:3
相关论文
共 11 条
[1]  
Chen Y.S., Wang G.P., Dong S., Learning with progressive transductive support vector machine, Pattern Recogn. Lett., 24, pp. 1845-1855, (2003)
[2]  
Chen Y.S., Wang G.P., Dong S.H., A progressive transductive inference algorithm based on support vector machine, J. Software, 14, pp. 451-460, (2003)
[3]  
Drucker H., Wu D., Vapnik V.N., Support vector machines for spam categorization, IEEE Trans. Neural Network, 10, pp. 1048-1054, (1999)
[4]  
Joachims T., Text categorization with support vector machines: Learning with many relevant features, Proceedings of the 10th European Conference on Machine Learning, pp. 137-142, (1998)
[5]  
Joachims T., Transductive inference for text classification using support vector machines, Proceedings of the 16th International Conference on Machine Learning, pp. 200-209, (1999)
[6]  
Joachims T., A statistical learning model of text classification for support vector machines, Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 128-136, (2001)
[7]  
Ma J.B., Teng G.F., Zhang Y.X., Li Y.L., Li Y., A cybercrime forensic method for Chinese web infonnation authorship analysis, Proceedings of 2009 Pacific Asia Workshop on Intelligence and Security Informatics, pp. 14-24, (2009)
[8]  
Ma J.B., Li Y., Teng G.F., Zhang Y.X., An authorship attribution forensic method for web information, ICIC Exp. Lett., 7, pp. 2609-2613, (2013)
[9]  
Ren G.B., Zhang J., Ma Y., Song P.J., An unlabeled samples labeling method of TSVM for remote sensing image, Proceedings of the 3rd IEEE International Conference on Computer Science and Information Technology, pp. 286-290, (2010)
[10]  
Vapnik V.N., Statistical Learning Theory, (1998)