A Note on the Effect of Term Weighting on Selecting Intrinsic Dimensionality of Data

被引:0
作者
Kumar, Ch. Aswani [1 ]
Srinivas, S. [2 ]
机构
[1] VIT Univ, Sch Comp Sci, Intelligent Syst Div, Vellore 632014, Tamil Nadu, India
[2] VIT Univ, Sch Sci & Human, Div Appl Math, Vellore 632014, Tamil Nadu, India
关键词
Dimensionality selection; Latent semantic indexing; Ssingular value decomposition; Term weighting;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The effect of term weighting on selecting intrinsic dimensionality of data is discussed. Experiments are conducted, using different term weighting and dimensionality selection methods, on four testing document collections (namely Medline, Cranfield, CACM and CISI). The results point that transforming the data matrix using a term weighting scheme plays a vital role in identifying the intrinsic dimensionality.
引用
收藏
页码:5 / 12
页数:8
相关论文
共 13 条
[1]  
Aswani Kumar C., 2006, International Journal of Applied Mathematics and Computer Science, P551
[2]   Matrices, vector spaces, and information retrieval [J].
Berry, MW ;
Drmac, Z ;
Jessup, ER .
SIAM REVIEW, 1999, 41 (02) :335-362
[3]  
Debole F, 2004, STUD FUZZ SOFT COMP, V138, P81
[4]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[5]  
2-9
[6]   Eigenvalue-based model selection during latent semantic indexing [J].
Efron, M .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2005, 56 (09) :969-988
[7]  
Eric C., 1999, ORNLTM13756
[8]  
Hui Fang, 2004, Proceedings of Sheffield SIGIR 2004. The Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P49
[9]   Term norm distribution and its effects on Latent Semantic Indexing [J].
Husbands, P ;
Simon, H ;
Ding, C .
INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (04) :777-787
[10]  
Kumar Ch, 2009, J COMPUTING INFORM T