A new term weighting scheme for text categorisation

被引:0
作者
Barigou, Fatiha [1 ]
机构
[1] Laboratory of Computer Science of Oran, Department of Computer Science, University of Oran, 1, Ahmed Ben Bella, Oran
关键词
K nearest neighbours; KNN; Supervised term weighting scheme; Term weighting; Text categorisation;
D O I
10.1504/IJISTA.2015.074332
中图分类号
学科分类号
摘要
Recently, the study of term weighting schemes has increasingly attracted the attention of researchers in the field of text categorisation (TC). Unlike information retrieval, TC is a supervised learning task that makes use of the prior information about the distribution of training documents in different predefined categories. This information, being omitted from traditional weighting schemes, is considered very useful and has been widely used for the term selection and building classifiers. This paper aims to study and analyse a new weighting measure to improve performance of a k nearest neighbours (kNN)-based TC. Copyright © 2015 Inderscience Enterprises Ltd.
引用
收藏
页码:256 / 272
页数:16
相关论文
共 19 条
[1]  
Debole F., Sebastiani F., Supervised term weighting for automated text categorization, Proceedings of the 2003 ACM Symposium on Applied Computing, pp. 784-788, (2003)
[2]  
Deng Z., Luo K., Yu H., A study of supervised term weighting scheme for sentiment analysis, Expert Syst. Appl., 41, 7, pp. 3506-3513, (2014)
[3]  
Deng Z., Tang S., Yang D., Zhang M., Li L., Xie K., A comparative study on feature weight in text categorization, Proc. Asia-Pacific Web Conf, 3007, pp. 588-597, (2004)
[4]  
Dumais S.T., Platt J., Heckerman D., Sahami M., Inductive learning algorithms and representations for text categorization, Proceedings of CIKM-98, 7th ACM International Conference on Information and Knowledge Management, pp. 148-155, (1998)
[5]  
Jones K.S., A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation, 28, pp. 11-21, (1972)
[6]  
Lan M., Tan C.L., Su J., Lu Y., Supervised and traditional term weighting methods for automatic text categorization, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, pp. 721-735, (2009)
[7]  
Leopold E., Kindermann J., Text categorization with support vector machines. How to represent texts in input space?, Machine Learning, 46, 1-3, pp. 423-444, (2002)
[8]  
Lertnattee V., Leuviphan C., Using class frequency for improving centroid-based text classification, ACEEE Int. Journal on Information Technology, 2, 2, pp. 62-66, (2012)
[9]  
Liu Y., Loh H., Sun A., Imbalanced text classification: A term weighting approach, Expert Systems with Applications, 36, pp. 690-701, (2009)
[10]  
Paltoglou G., Thelwall M., A study of information retrieval weighting schemes for sentiment analysis, Proceeding of ACL, pp. 1386-1395, (2010)