Inverted index compression using word-aligned binary codes

被引:152
作者
Anh, VN [1 ]
Moffat, A [1 ]
机构
[1] Univ Melbourne, Dept Comp Sci & Software Engn, Melbourne, Vic 3010, Australia
来源
INFORMATION RETRIEVAL | 2005年 / 8卷 / 01期
基金
澳大利亚研究理事会;
关键词
index compression; integer coding; index representation;
D O I
10.1023/B:INRT.0000048490.99518.5c
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We examine index representation techniques for document-based inverted files, and present a mechanism for compressing them using word-aligned binary codes. The new approach allows extremely fast decoding of inverted lists during query processing, while providing compression rates better than other high-throughput representations. Results are given for several large text collections in support of these claims, both for compression effectiveness and query efficiency.
引用
收藏
页码:151 / 166
页数:16
相关论文
共 19 条
  • [1] ANH VN, 2004, P 15 AUSTR DAT C JAN, P61
  • [2] BAEZAYATES RA, 1999, MODERN INFORMATION R
  • [3] Engineering a multi-purpose test collection for Web retrieval experiments
    Bailey, P
    Craswell, N
    Hawking, D
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (06) : 853 - 871
  • [4] Index compression through document reordering
    Blandford, D
    Blelloch, G
    [J]. DCC 2002: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2002, : 342 - 351
  • [5] CRASWELL N, 2002, NIST SPECIAL PUBLICA, P248
  • [6] Fast and flexible word searching on compressed text
    de Moura, ES
    Navarro, G
    Ziviani, N
    BaezaYates, R
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2000, 18 (02) : 113 - 139
  • [7] FRAKES W, 1992, INFORMATION RETRIEVA
  • [8] OVERVIEW OF THE 2ND TEXT RETRIEVAL CONFERENCE (TREC-2)
    HARMAN, D
    [J]. INFORMATION PROCESSING & MANAGEMENT, 1995, 31 (03) : 271 - 289
  • [9] Binary interpolative coding for effective index compression
    Moffat, A
    Stuiver, L
    [J]. INFORMATION RETRIEVAL, 2000, 3 (01): : 25 - 47
  • [10] Persin M, 1996, J AM SOC INFORM SCI, V47, P749, DOI 10.1002/(SICI)1097-4571(199610)47:10<749::AID-ASI3>3.0.CO