HOLZ: High-Order Entropy Encoding of Lempel-Ziv Factor Distances

被引:1
作者
Koppl, Dominik [1 ]
Navarro, Gonzalo [2 ]
Prezza, Nicola [3 ]
机构
[1] TMDU, Tokyo, Japan
[2] Univ Chile, Dept Comp Sci, Santiago, Chile
[3] Ca Foscari Univ, DAIS, Venice, Italy
来源
DCC 2022: 2022 DATA COMPRESSION CONFERENCE (DCC) | 2022年
关键词
COMPLEXITY; ALGORITHM;
D O I
10.1109/DCC52660.2022.00016
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We propose a new representation of the offsets of the Lempel-Ziv (LZ) factorization based on the co-lexicographic order of the text's prefixes. The selected offsets tend to approach the k-th order empirical entropy. Our evaluations show that this choice is superior to the rightmost and bit-optimal LZ parsings on datasets with small high-order entropy.
引用
收藏
页码:83 / 92
页数:10
相关论文
共 16 条
  • [1] Belazzougui D., 2016, SODA 2016, P2053, DOI DOI 10.1137/1.9781611974331.CH143
  • [2] Burrows M., 1994, 124 DIG SYST RES CTR
  • [3] Cover T.M., 2012, Elements of Information Theory
  • [4] A simple algorithm for computing the Lempel-Ziv factorization
    Crochemore, Maxime
    Ilie, Lucian
    Smyth, W. F.
    [J]. DCC: 2008 DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2008, : 482 - +
  • [5] BICRITERIA DATA COMPRESSION
    Farruggia, Andrea
    Ferragina, Paolo
    Frangioni, Antonio
    Venturini, Rossano
    [J]. SIAM JOURNAL ON COMPUTING, 2019, 48 (05) : 1603 - 1642
  • [6] ON THE BIT-COMPLEXITY OF LEMPEL-ZIV COMPRESSION
    Ferragina, Paolo
    Nitto, Igor
    Venturini, Rossano
    [J]. SIAM JOURNAL ON COMPUTING, 2013, 42 (04) : 1521 - 1541
  • [7] Grossi R, 2003, SIAM PROC S, P841
  • [8] Köoppl D, 2021, Arxiv, DOI arXiv:2111.02478
  • [9] COMPLEXITY OF FINITE SEQUENCES
    LEMPEL, A
    ZIV, J
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1976, 22 (01) : 75 - 81
  • [10] Compressed Data Structures for Dynamic Sequences
    Munro, J. Ian
    Nekrich, Yakov
    [J]. ALGORITHMS - ESA 2015, 2015, 9294 : 891 - 902