Improved Compressed Indexes for Full-Text Document Retrieval

被引:0
作者
Belazzougui, Djamal [1 ,2 ]
Navarro, Gonzalo [2 ]
机构
[1] Univ Paris 07, LIAFA, F-75221 Paris 05, France
[2] Univ Chile, Dept Comp Sci, Santiago, Chile
来源
STRING PROCESSING AND INFORMATION RETRIEVAL | 2011年 / 7024卷
关键词
EFFICIENT ALGORITHMS; QUERIES;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We give new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences. On a collection of D documents of total length 72, current approaches require at least vertical bar CSA vertical bar + O(n lg D/1g lg D) or 2 vertical bar CSA vertical bar + o(n) bits of space, where CSA is a full-text index. Using monotone mininum perfect hash functions, we give new algorithms for document listing with frequencies and top-k document retrieval using just vertical bar CSA vertical bar + O(n lg lg lg D) bits. We also improve current solutions that use 2 vertical bar CSA vertical bar + o(n) bits, and consider other problems such as colored range listing, top-k, most important documents, and computing arbitrary frequencies.
引用
收藏
页码:386 / +
页数:3
相关论文
共 25 条
  • [1] Apostolico A., 1985, Combinatorial Algorithms on Words, P85
  • [2] Belazzougui D., 2009, ALENEX
  • [3] Belazzougui D, 2009, PROCEEDINGS OF THE TWENTIETH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P785
  • [4] Culpepper JS, 2010, LECT NOTES COMPUT SC, V6347, P194, DOI 10.1007/978-3-642-15781-3_17
  • [5] Compressed Representations of Sequences and Full-Text Indexes
    Ferragina, Paolo
    Manzini, Giovanni
    Makinen, Veli
    Navarro, Gonzalo
    [J]. ACM TRANSACTIONS ON ALGORITHMS, 2007, 3 (02)
  • [6] Optimal Succinctness for Range Minimum Queries
    Fischer, Johannes
    [J]. LATIN 2010: THEORETICAL INFORMATICS, 2010, 6034 : 158 - 169
  • [7] Gagie T, 2010, LECT NOTES COMPUT SC, V6393, P67, DOI 10.1007/978-3-642-16321-0_7
  • [8] Gagie T, 2009, LECT NOTES COMPUT SC, V5721, P1, DOI 10.1007/978-3-642-03784-9_1
  • [9] Grossi R, 2003, SIAM PROC S, P841
  • [10] Grossi R, 2010, LECT NOTES COMPUT SC, V6198, P678, DOI 10.1007/978-3-642-14165-2_57