Static pruning of terms in inverted files

被引:0
作者
Blanco, Roi [1 ]
Barreiro, Alvaro [1 ]
机构
[1] Univ A Coruna, Dept Comp Sci, IRLab, La Coruna, Spain
来源
ADVANCES IN INFORMATION RETRIEVAL | 2007年 / 4425卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the problem of identifying collection dependent stop-words in order to reduce the size of inverted files. We present four methods to automatically recognise stop-words, analyse the tradeoff between efficiency and effectiveness, and compare them with a previous pruning approach. The experiments allow us to conclude that in some situations stop-words pruning is competitive with respect to other inverted file reduction techniques.
引用
收藏
页码:64 / +
页数:2
相关论文
共 14 条
  • [1] BAHLE D, P ACM SIGIR 2002, P215
  • [2] CARMEL D, P ACM SIGIR 2001, P43
  • [3] CHURCH K, 1995, NAT LANG ENG, V2, P163
  • [4] DEMOURA ES, P WWW 2005, P235
  • [5] Fox C., 1989, ACM SIGIR FORUM, V24, P19
  • [6] LO RTW, 2005, P DIR 05
  • [7] Moffat A., 2002, Compression and Coding Algorithms
  • [8] RENNIE JDM, P ACM SIGIR 2005, P353
  • [9] RELEVANCE WEIGHTING OF SEARCH TERMS
    ROBERTSON, SE
    SPARCK-JONES, K
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1976, 27 (03): : 129 - 146
  • [10] ROBERTSON SE, 1996, TEXT RETR C, P21