Incremental cluster-based retrieval using compressed cluster-skipping inverted files

被引:22
作者
Altingovde, Ismail Sengor [1 ]
Demir, Engin [1 ]
Can, Fazli [1 ]
Ulusoy, Oezguer [1 ]
机构
[1] Bilkent Univ, Dept Comp Engn, TR-06800 Ankara, Turkey
关键词
experimentation; measurement; performance; best match; cluster-based retrieval (CBR); cluster-skipping inverted index structure (CS-IIS); full search (FS); index compression; inverted index structure (IIS); query processing;
D O I
10.1145/1361684.1361688
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query term. As we switch from term to term, the best clusters are recomputed and can dynamically change. During query-document matching, only relevant portions of the posting lists corresponding to the best clusters are considered and the rest are skipped. The proposed approach is essentially tailored for environments where inverted files are compressed, and provides substantial efficiency improvement while yielding comparable, or sometimes better, effectiveness figures. Our experiments with various collections show that the incremental- CBR strategy using a compressed cluster-skipping inverted file significantly improves CPU time efficiency, regardless of query length. The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size.
引用
收藏
页数:36
相关论文
共 52 条
[1]  
ALTINGOVDE IS, 2007, P 30 ANN INF RETR, P891
[2]  
Altingovde IS, 2006, LECT NOTES COMPUT SC, V4263, P707
[3]  
ANH NV, 2001, P 24 ANN INT ACM SIG, P35
[4]  
Anh V. N., 2006, Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P372, DOI 10.1145/1148170.1148235
[5]   Inverted index compression using word-aligned binary codes [J].
Anh, VN ;
Moffat, A .
INFORMATION RETRIEVAL, 2005, 8 (01) :151-166
[6]  
ANH VN, 2005, P 28 ANN INT ACM SIG, P226
[7]  
[Anonymous], 2007, P 30 ANN INT ACM SIG
[8]  
[Anonymous], P 27 INT ACM SIGIR C
[9]  
[Anonymous], 1994, MANAGING GIGABYTES C
[10]  
Baeza-Yates R., 2007, P 30 ANN INT ACM SIG, P183, DOI DOI 10.1145/1277741.1277775