Fast Dictionary-Based Compression for Inverted Indexes

被引:16
作者
Pibiri, Giulio Ermanno [1 ,2 ]
Petri, Matthias [3 ]
Moffat, Alistair [3 ]
机构
[1] Univ Pisa, Pisa, Italy
[2] ISTI CNR, Pisa, Italy
[3] Univ Melbourne, Melbourne, Vic, Australia
来源
PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19) | 2019年
关键词
Inverted index; efficiency; compression; decoding; STORAGE;
D O I
10.1145/3289600.3290962
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dictionary-based compression schemes provide fast decoding operation, typically at the expense of reduced compression effectiveness compared to statistical or probability-based approaches. In this work, we apply dictionary-based techniques to the compression of inverted lists, showing that the high degree of regularity that these integer sequences exhibit is a good match for certain types of dictionary methods, and that an important new trade-off balance between compression effectiveness and compression efficiency can be achieved. Our observations are supported by experiments using the document-level inverted index data for two large text collections, and a wide range of other index compression implementations as reference points. Those experiments demonstrate that the gap between efficiency and effectiveness can be substantially narrowed.
引用
收藏
页码:6 / 14
页数:9
相关论文
共 45 条
[1]   Inverted index compression using word-aligned binary codes [J].
Anh, VN ;
Moffat, A .
INFORMATION RETRIEVAL, 2005, 8 (01) :151-166
[2]   Index compression using 64-bit words [J].
Anh, Vo Ngoc ;
Moffat, Alistair .
SOFTWARE-PRACTICE & EXPERIENCE, 2010, 40 (02) :131-147
[3]   Off-line compression by greedy textual substitution [J].
Apostolico, A ;
Lonardi, S .
PROCEEDINGS OF THE IEEE, 2000, 88 (11) :1733-1744
[4]   LINEAR-APPROXIMATION OF SHORTEST SUPERSTRINGS [J].
BLUM, A ;
JIANG, T ;
LI, M ;
TROMP, J ;
YANNAKAKIS, M .
JOURNAL OF THE ACM, 1994, 41 (04) :630-647
[5]  
Brisaboa N.R., 2008, Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P139, DOI DOI 10.1145/1390334.1390360
[6]  
Brisaboa NR, 2003, LECT NOTES COMPUT SC, V2857, P122
[7]  
Claude F., 2009, ABS09113318 CORR
[8]   Efficient Set Intersection for Inverted Indexing [J].
Culpepper, J. Shane ;
Moffat, Alistair .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2010, 29 (01)
[9]  
Culpepper JS, 2005, LECT NOTES COMPUT SC, V3772, P1
[10]  
Dean J., 2009, P WSDM