Inverted index compression using word-aligned binary codes

被引:154
作者
Anh, VN [1 ]
Moffat, A [1 ]
机构
[1] Univ Melbourne, Dept Comp Sci & Software Engn, Melbourne, Vic 3010, Australia
来源
INFORMATION RETRIEVAL | 2005年 / 8卷 / 01期
基金
澳大利亚研究理事会;
关键词
index compression; integer coding; index representation;
D O I
10.1023/B:INRT.0000048490.99518.5c
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We examine index representation techniques for document-based inverted files, and present a mechanism for compressing them using word-aligned binary codes. The new approach allows extremely fast decoding of inverted lists during query processing, while providing compression rates better than other high-throughput representations. Results are given for several large text collections in support of these claims, both for compression effectiveness and query efficiency.
引用
收藏
页码:151 / 166
页数:16
相关论文
共 19 条
[1]  
ANH VN, 2004, P 15 AUSTR DAT C JAN, P61
[2]  
BAEZAYATES RA, 1999, MODERN INFORMATION R
[3]   Engineering a multi-purpose test collection for Web retrieval experiments [J].
Bailey, P ;
Craswell, N ;
Hawking, D .
INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (06) :853-871
[4]   Index compression through document reordering [J].
Blandford, D ;
Blelloch, G .
DCC 2002: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2002, :342-351
[5]  
CRASWELL N, 2002, NIST SPECIAL PUBLICA, P248
[6]   Fast and flexible word searching on compressed text [J].
de Moura, ES ;
Navarro, G ;
Ziviani, N ;
BaezaYates, R .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2000, 18 (02) :113-139
[7]  
FRAKES W, 1992, INFORMATION RETRIEVA
[8]   OVERVIEW OF THE 2ND TEXT RETRIEVAL CONFERENCE (TREC-2) [J].
HARMAN, D .
INFORMATION PROCESSING & MANAGEMENT, 1995, 31 (03) :271-289
[9]   Binary interpolative coding for effective index compression [J].
Moffat, A ;
Stuiver, L .
INFORMATION RETRIEVAL, 2000, 3 (01) :25-47
[10]  
Persin M, 1996, J AM SOC INFORM SCI, V47, P749, DOI 10.1002/(SICI)1097-4571(199610)47:10<749::AID-ASI3>3.0.CO