SVM-based interactive document retrieval with active learning

被引:6
作者
Onoda T. [1 ]
Murata H. [1 ]
Yamada S. [2 ]
机构
[1] Central Research Institute of Electric Power Industry, Komae, Tokyo 201-8511, 2-11-1, Iwado Kita
[2] National Institute of Informatics, SOKENDAI, Chiyoda, Tokyo 101-8430, 2-1-1, Hitotsubashi
关键词
Active learning; Document retrieval; Relevance feedback; Support vector machines;
D O I
10.1007/s00354-007-0034-4
中图分类号
学科分类号
摘要
This paper describes an application of SVM (Support Vector Machines) to interactive document retrieval using active learning. Some works have been done to apply classification learning like SVM to relevance feedback and have obtained successful results. However they did not fully utilize characteristic of example distribution in document retrieval. We propose heuristics to bias document showing for user's judgement according to distribution of examples in document retrieval. This heuristics is executed by selecting examples to show a user in neighbors of positive support vectors, and it improves learning efficiency. We implemented a SVM-based interactive document retrieval system using our proposed heuristics, and compared it with conventional systems like Rocchio-based system and a SVM-based system without the heuristics. We conducted systematic experiments using large data sets including over 500,000 newspaper articles and confirmed our system outperformed other ones. © Ohmsha, Ltd. and Springer 2008.
引用
收藏
页码:49 / 61
页数:12
相关论文
共 21 条
[1]  
Cortes C., Vapnik V., support vector networks, Machine Learning, 20, pp. 273-297, (1995)
[2]  
Drucker H., Shahrary B., Gibbon D.C., Relevance feedback using support vector machines, Proc. of the 18th Int'l Conf. on Machine Learning, pp. 122-129, (2001)
[3]  
Drucker H., Wu D., Vapnik V.N., support vector machines for spam categorization, IEEE Transaction on Neural Networks, 10, pp. 1048-1054, (1999)
[4]  
Dumais S.T., Platt J., Heckerman D., Sahami M., Inductive learning algorithms and representations for text categorization, Proc. of the 17th Int'l Conf. on Information and Knowledge Management, pp. 148-155, (1998)
[5]  
Joachim T., Text categorization with support vector machines: Learning with many relevant features, Proc. of the 10th European Conf. on Machine Learning, pp. 137-142, (1998)
[6]  
Melville P., Mooney R.J., Diverse ensembles for active learning, Proc. of the 21st Int'l Conf. on Machine Learning, pp. 584-591, (2004)
[7]  
Okabe M., Yamada S., learning filtering rulesets for ranking refinement in relevance feedback, Knowledge-Based Systems, 18, pp. 117-124, (2005)
[8]  
Onoda T., Murata H., Yamada S., Non-relevance feedback document retrieval based on one class svm and svdd, Proc. of 2006 IEEE World Cong. on Computational Intelligence, pp. 2191-2198, (2006)
[9]  
Onoda T., Murata H., Yamada S., Support vector machines based active learning for the relevance feedback document retrieval, Proc. of the Int'l Workshop on Intelligent Web Interaction, pp. 393-396, (2006)
[10]  
Rocchio J., Relevance feedback in information retrieval, The Smart System-experiments in Automatic Document Processing, pp. 313-323, (1971)