Sublinear Time Lempel-Ziv (LZ77) Factorization

被引:0
作者
Ellert, Jonas [1 ]
机构
[1] Tech Univ Dortmund, Dortmund, Germany
来源
STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2023 | 2023年 / 14240卷
关键词
Lempel-Ziv; LZ77; LZ-like; lossless compression; word-packing; sublinear time; string algorithms; approximation algorithms; DATA-COMPRESSION; ALGORITHM; COMPLEXITY;
D O I
10.1007/978-3-031-43980-3_14
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Lempel-Ziv (LZ77) factorization of a string is a widely-used algorithmic tool that plays a central role in data compression and indexing. For a length-n string over integer alphabet [0, sigma) with s = n(O(1)), and on a word RAM of width w = Theta(log n), it can be computed in O(n) time. However, the packed representation of the string occupies only Theta(n log sigma) bits or equivalently Theta(n / log(sigma) n) words of space, and hence we can hope for algorithms that run in O(n / log(sigma) n) time and words of space. Kempa showed how to compute the LZ77 factorization with overlaps in O(n / log(sigma) n+z log(11) n) time and O(n / log(sigma) n+z log(10) n) words of space, where z is the number of phrases in the LZ77 factorization (SODA 2019). We significantly improve this result by achieving O(n / log(sigma) n + z log(3+epsilon) z) time with overlaps, and O(n / log(sigma) n + z log(23/5+epsilon) z) without overlaps (for any constant epsilon is an element of R+). In both cases, we require only O(n / log(sigma) n) words of space. One ingredient of the solution is a novel approximation algorithm that computes an LZ-like parsing of at most 3z phrases in O(n / log(sigma) n) time and words of space. All algorithms are deterministic.
引用
收藏
页码:171 / 187
页数:17
相关论文
共 50 条
  • [41] Average profile of the Lempel-Ziv parsing scheme for a Markovian source
    Jacquet, P
    Szpankowski, W
    Tang, J
    ALGORITHMICA, 2001, 31 (03) : 318 - 360
  • [42] Suffix arrays - A competitive choice for fast Lempel-Ziv compressions
    Ferreira, Artur J.
    Oliveira, Arlindo L.
    Figueiredo, Mario A. T.
    SIGMAP 2008: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS, 2008, : 5 - +
  • [43] NEW ASYMPTOTIC BOUNDS AND IMPROVEMENTS ON THE LEMPEL-ZIV DATA-COMPRESSION ALGORITHM
    BENDER, PE
    WOLF, JK
    IEEE TRANSACTIONS ON INFORMATION THEORY, 1991, 37 (03) : 721 - 729
  • [44] An Alternative to NCD for Large Sequences, Lempel-Ziv Jaccard Distance
    Raff, Edward
    Nicholas, Charles
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 1007 - 1015
  • [45] Lempel-Ziv Jaccard Distance, an effective alternative to ssdeep and sdhash
    Raff, Edward
    Nicholas, Charles
    DIGITAL INVESTIGATION, 2018, 24 : 34 - 49
  • [46] HOLZ: High-Order Entropy Encoding of Lempel-Ziv Factor Distances
    Koppl, Dominik
    Navarro, Gonzalo
    Prezza, Nicola
    DCC 2022: 2022 DATA COMPRESSION CONFERENCE (DCC), 2022, : 83 - 92
  • [47] Relations Between Greedy and Bit-Optimal LZ77 Encodings
    Kosolobov, Dmitry
    35TH SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2018), 2018, 96
  • [48] Index for estimation of muscle force from mechanomyography based on the Lempel-Ziv algorithm
    Sarlabous, Leonardo
    Torres, Abel
    Fiz, Jose A.
    Morera, Josep
    Jane, Raimon
    JOURNAL OF ELECTROMYOGRAPHY AND KINESIOLOGY, 2013, 23 (03) : 548 - 557
  • [49] Quantitative trend fault diagnosis of a rolling bearing based on Sparsogram and Lempel-Ziv
    Cui, Lingli
    Li, Beibei
    Ma, Jianfeng
    Jin, Zhi
    MEASUREMENT, 2018, 128 : 410 - 418
  • [50] Efficient VLSI for Lempel-Ziv compression in wireless data communication networks
    Jung, BJ
    Burleson, WP
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 1998, 6 (03) : 475 - 483