Index compression using 64-bit words

被引:64
|
作者
Anh, Vo Ngoc [1 ]
Moffat, Alistair [1 ]
机构
[1] Univ Melbourne, Dept Comp Sci & Software Engn, Melbourne, Vic 3010, Australia
来源
SOFTWARE-PRACTICE & EXPERIENCE | 2010年 / 40卷 / 02期
基金
澳大利亚研究理事会;
关键词
performance; measurement; index compression; information retrieval; TEXT RETRIEVAL; INFORMATION-RETRIEVAL; INVERTED FILES; SYSTEMS;
D O I
10.1002/spe.948
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Modern computers typically make use of 64-bit words as the fundamental unit of data access. However the decade-long migration from 32-bit architectures has not been reflected in compression technology, because of a widespread assumption that effective compression techniques operate in terms of bits or bytes, rather than words. Here we demonstrate that the use of 64-bit access units, especially in connection with word-bounded codes, does indeed provide the opportunity for improving the compression performance. In particular, we extend several 32-bit word-bounded coding schemes to 64-bit operation and explore their uses in information retrieval applications. Our results show that the Simple-8b approach, a 64-bit word-bounded code, is an excellent self-skipping code, and has a clear advantage over its competitors in supporting fast query evaluation when the data being compressed represents the inverted index for a large text collection. The advantages of the new code also accrue on 32-bit architectures, and for all of Boolean. ranked, and phrase queries; which means that it can be used in any situation. Copyright (C) 2010 John Wiley & Sons, Ltd.
引用
收藏
页码:131 / 147
页数:17
相关论文
共 50 条
  • [1] 64-bit players
    Donelan, J
    COMPUTER GRAPHICS WORLD, 2004, 27 (03) : 30 - +
  • [2] 64-bit computing
    Halpern, M
    COMPUTER-AIDED ENGINEERING, 1996, 15 (06): : 80 - 80
  • [3] 64-BIT COMPUTING
    MASHEY, JR
    BYTE, 1991, 16 (09): : 135 - &
  • [4] The 64-bit question
    Bunn, Simon
    e.nz magazine, 2003, 4 (03):
  • [5] Microsoft and 64-bit NT
    不详
    DATAMATION, 1996, 42 (13): : 11 - 11
  • [6] Graphics & the 64-bit world
    Pournelle, J
    DR DOBBS JOURNAL, 2005, 30 (11): : 78 - 79
  • [7] Intel 64-bit processors
    Guštin, Veselko
    Elektrotehniski Vestnik/Electrotechnical Review, 2007, 74 (04): : 201 - 206
  • [8] The 64-bit universal RNG
    Marsaglia, G
    Tsang, WW
    STATISTICS & PROBABILITY LETTERS, 2004, 66 (02) : 183 - 187
  • [9] Exploiting 64-bit parallelism
    Bagwell, P
    DR DOBBS JOURNAL, 2000, 25 (11): : 10 - 10
  • [10] THE BASICS OF 64-BIT COMPUTING
    Svetic, Sandi
    Sok, Antun
    ENGINEERING REVIEW, 2008, 28 (02) : 119 - 130