A Novel Active Learning Method Using SVM for Text Classification

被引:173
作者
Goudjil M. [1 ]
Koudil M. [1 ]
Bedda M. [2 ]
Ghoggali N. [3 ]
机构
[1] École nationale Supérieure d’Informatique (ESI), Oued Smar, Algiers
[2] AL Jouf University, Sakaka
[3] LAAAS laboratory, Faculté de Technologie, Université Batna 2, Batna
关键词
active learning; pairwise coupling; pool-based active learning; support vector machine (SVM); Text categorization;
D O I
10.1007/s11633-015-0912-z
中图分类号
学科分类号
摘要
Support vector machines (SVMs) are a popular class of supervised learning algorithms, and are particularly applicable to large and high-dimensional classification problems. Like most machine learning methods for data classification and information retrieval, they require manually labeled data samples in the training stage. However, manual labeling is a time consuming and errorprone task. One possible solution to this issue is to exploit the large number of unlabeled samples that are easily accessible via the internet. This paper presents a novel active learning method for text categorization. The main objective of active learning is to reduce the labeling effort, without compromising the accuracy of classification, by intelligently selecting which samples should be labeled. The proposed method selects a batch of informative samples using the posterior probabilities provided by a set of multi-class SVM classifiers, and these samples are then manually labeled by an expert. Experimental results indicate that the proposed active learning method significantly reduces the labeling effort, while simultaneously enhancing the classification accuracy. © 2016, Institute of Automation, Chinese Academy of Sciences and Springer-Verlag GmbH Germany, part of Springer Nature.
引用
收藏
页码:290 / 298
页数:8
相关论文
共 41 条
[1]  
Sebastiani F., Machine learning in automated text categorization, ACM Computing Surveys, 34, 1, pp. 1-47, (2002)
[2]  
Settles B., Active Learning Literature Survey, (2010)
[3]  
Lewis D.D., Gale W.A., A sequential algorithm for training text classifiers, Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (1994)
[4]  
Persello C., Bruzzone L., Active and semisupervised learning for the classification of remote sensing images, IEEE Transactions on Geoscience and Remote Sensing, 52, 11, pp. 6937-6956, (2014)
[5]  
Chen G., Wang T.J., Gong L.Y., Herrera P., Multi-class support vector machine active learning for music annotation, International Journal of Innovative Computing, Information and Control, 6, 3, pp. 921-930, (2010)
[6]  
Tong S., Koller D., Support vector machine active learning with applications to text classification, Journal of Machine Learning Research, 2, pp. 45-66, (2002)
[7]  
Balamurugan S.A.A., Rajaram R., Effective and efficient feature selection for large-scale data using Bayestheorem, International Journal of Automation and Computing, 6, 1, pp. 62-71, (2009)
[8]  
Mangai J.A., Kumar V.S., alias Balamurugan, A novel feature selection framework for automatic web page classification. International Journal of Automation and Computing, 9, 4, pp. 442-448, (2012)
[9]  
Hmeidi I., Hawashin B., El-Qawasmeh E., Performance of KNN and SVM classifiers on full word Arabic articles, Advanced Engineering Informatics, 22, 1, pp. 106-111, (2008)
[10]  
Trstenjak B., Mikac S., Donko D., KNN with TF-IDF based framework for text categorization, Procedia Engineering, 69, pp. 1356-1364, (2014)