Dimensionality reduction by combining category information and latent semantic index for text categorization

被引:0
作者
Zheng, Wenbin [1 ]
An, Lixin [1 ,2 ]
Xu, Zhanyi [1 ]
机构
[1] College of Information Engineering, China Jiliang University
[2] College of Textiles, Donghua University
来源
Journal of Information and Computational Science | 2013年 / 10卷 / 08期
关键词
Category information; Dimensionality reduction; Latent semantic indexing; Text categorization;
D O I
10.12733/jics20101814
中图分类号
学科分类号
摘要
The Latent Semantic Indexing (LSI) is a commonly used dimensionality reduction methods in text categorization; however, as a linear reconstructed method, its goal is to obtain the optimal representative feature rather than the optimal classification feature. This paper proposes a novel method in which the categorization information is combined into the latent semantic indexing to obtain more discriminating features than the standard latent semantic indexing. The experimental results show that the proposed method achieves good performance on two benchmark data sets, especially in the case where the dimensionality is greatly reduced. Copyright © 2013 Binary Information Press.
引用
收藏
页码:2463 / 2469
页数:6
相关论文
共 10 条
[1]  
Sebastiani F., Machine learning in automated text categorization, ACM Computing Surveys, 34, 1, pp. 1-47, (2002)
[2]  
Yang Y., Pedersen J.O., A comparative study on feature selection in text categorization, Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412-420, (1997)
[3]  
Bekkerman R., El-Yaniv R., Tishby N., Winter Y., Distributional word clusters vs. words for text categorization, Journal of Machine Learning Research, 3, 7-8, pp. 1183-208, (2003)
[4]  
Dhillon I.S., Mallela S., Kumar R., A divisive information-theoretic feature clustering algorithm for text classification, Journal of Machine Learning Research, 3, 7-8, pp. 1265-1287, (2003)
[5]  
Zheng W., Qian Y., Tang H., Dimensionality reduction with category information fusion and non-negative matrix factorization for text categorization, Artificial Intelligence and Computational Intelligence, pp. 505-512, (2011)
[6]  
Zhou S.B., Li K., Liu Y.S., Text categorization based on topic model, Rough Sets and Knowledge Technology, 5009, pp. 572-579, (2008)
[7]  
Landauer T., Foltz P., Laham D., An introduction to latent semantic analysis, Discourse Processes, 25, 2, pp. 259-284, (1998)
[8]  
Batra S., Bawa S., Using LSI and its variants in text classification, Advanced Techniques in Computing Sciences and Software Engineering, pp. 313-316, (2010)
[9]  
Salton G., Buckley C., Term-weighting approaches in automatic text retrieval, Information Processing & Management, 24, 5, pp. 513-523, (1988)
[10]  
Joachims T., Text categorization with support vector machines: Learning with many relevant features, 10th European Conference on Machine Learning, pp. 137-142, (1998)