Term norm distribution and its effects on Latent Semantic Indexing

被引:9
作者
Husbands, P [1 ]
Simon, H [1 ]
Ding, C [1 ]
机构
[1] Univ Calif Berkeley, Lawrence Berkeley Lab, Berkeley, CA 94720 USA
关键词
information retrieval; LSI; TREC;
D O I
10.1016/j.ipm.2004.03.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Latent Semantic Indexing (LSI) uses the singular value decomposition to reduce noisy dimensions and improve the performance of text retrieval systems. Preliminary results have shown modest improvements in retrieval accuracy and recall, but these have mainly explored small collections. In this paper we investigate text retrieval on a larger document collection (TREC) and focus on distribution of word norm (magnitude). Our results indicate the inadequacy of word representations in LSI space on large collections. We emphasize the query expansion interpretation of LSI and propose an LSI term normalization that achieves better performance on larger collections. (c) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:777 / 787
页数:11
相关论文
共 23 条
[1]  
ANDO RK, 2001, P 24 SIGIR, P154
[2]  
BAKER L, 1998, P 21 ACM C RES DEV I
[3]  
BARTELL BT, 1995, J AM SOC INFORM SCI, V46, P251
[4]   LARGE-SCALE SPARSE SINGULAR VALUE COMPUTATIONS [J].
BERRY, MW .
INTERNATIONAL JOURNAL OF SUPERCOMPUTER APPLICATIONS AND HIGH PERFORMANCE COMPUTING, 1992, 6 (01) :13-49
[5]   Using linear algebra for intelligent information retrieval [J].
Berry, MW ;
Dumais, ST ;
OBrien, GW .
SIAM REVIEW, 1995, 37 (04) :573-595
[6]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[7]  
2-9
[8]  
DING CHQ, 1999, P 22 ACM SIGIR C, P59
[9]  
DUMAIS S, 1995, 3 TEXT RETRIEVAL C
[10]   IMPROVING THE RETRIEVAL OF INFORMATION FROM EXTERNAL SOURCES [J].
DUMAIS, ST .
BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 1991, 23 (02) :229-236