Sublinear Time Lempel-Ziv (LZ77) Factorization

被引:0
作者
Ellert, Jonas [1 ]
机构
[1] Tech Univ Dortmund, Dortmund, Germany
来源
STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2023 | 2023年 / 14240卷
关键词
Lempel-Ziv; LZ77; LZ-like; lossless compression; word-packing; sublinear time; string algorithms; approximation algorithms; DATA-COMPRESSION; ALGORITHM; COMPLEXITY;
D O I
10.1007/978-3-031-43980-3_14
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Lempel-Ziv (LZ77) factorization of a string is a widely-used algorithmic tool that plays a central role in data compression and indexing. For a length-n string over integer alphabet [0, sigma) with s = n(O(1)), and on a word RAM of width w = Theta(log n), it can be computed in O(n) time. However, the packed representation of the string occupies only Theta(n log sigma) bits or equivalently Theta(n / log(sigma) n) words of space, and hence we can hope for algorithms that run in O(n / log(sigma) n) time and words of space. Kempa showed how to compute the LZ77 factorization with overlaps in O(n / log(sigma) n+z log(11) n) time and O(n / log(sigma) n+z log(10) n) words of space, where z is the number of phrases in the LZ77 factorization (SODA 2019). We significantly improve this result by achieving O(n / log(sigma) n + z log(3+epsilon) z) time with overlaps, and O(n / log(sigma) n + z log(23/5+epsilon) z) without overlaps (for any constant epsilon is an element of R+). In both cases, we require only O(n / log(sigma) n) words of space. One ingredient of the solution is a novel approximation algorithm that computes an LZ-like parsing of at most 3z phrases in O(n / log(sigma) n) time and words of space. All algorithms are deterministic.
引用
收藏
页码:171 / 187
页数:17
相关论文
共 50 条
  • [31] Compression of low entropy strings with Lempel-Ziv algorithms
    Kosaraju, SR
    Manzini, G
    SIAM JOURNAL ON COMPUTING, 2000, 29 (03) : 893 - 911
  • [32] Redundancy estimates for the Lempel-Ziv algorithm of data compression
    Potapov, VN
    DISCRETE APPLIED MATHEMATICS, 2004, 135 (1-3) : 245 - 254
  • [33] Performance Evaluation of Tightening Equipment Based on Information Entropy and Lempel-Ziv
    Fan G.
    Li A.
    Liu X.
    Gu J.
    Xu L.
    Zhendong Ceshi Yu Zhenduan/Journal of Vibration, Measurement and Diagnosis, 2019, 39 (01): : 88 - 94and223
  • [34] Time-space trade-offs for Lempel-Ziv compressed indexing
    Bille, Philip
    Ettienne, Mikko Berggren
    Gortz, Inge Li
    Vildhoj, Hjalte Wedel
    THEORETICAL COMPUTER SCIENCE, 2018, 713 : 66 - 77
  • [35] On the Suitability of Suffix Arrays for Lempel-Ziv Data Compression
    Ferreira, Artur J.
    Oliveira, Arlindo L.
    Figueiredo, Mario A. T.
    E-BUSINESS AND TELECOMMUNICATIONS, 2009, 48 : 267 - +
  • [36] Stronger Lempel-Ziv Based Compressed Text Indexing
    Arroyuelo, Diego
    Navarro, Gonzalo
    Sadakane, Kunihiko
    ALGORITHMICA, 2012, 62 (1-2) : 54 - 101
  • [37] A Normal Sequence Compressed by PPM* But Not by Lempel-Ziv 78
    Jordon, Liam
    Moser, Philippe
    SOFSEM 2021: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2021, 12607 : 389 - 399
  • [38] Improving a lightweight LZ77 computation algorithm for running faster
    Liu, Wei Jun
    Nong, Ge
    Chan, Wai Hong
    Wu, Yi
    SOFTWARE-PRACTICE & EXPERIENCE, 2016, 46 (09) : 1201 - 1217
  • [39] Faster and Simpler Online/Sliding Rightmost Lempel-Ziv Factorizations
    Sumiyoshi, Wataru
    Mieno, Takuya
    Inenaga, Shunsuke
    STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2024, 2025, 14899 : 321 - 335
  • [40] Average Profile of the Lempel-Ziv Parsing Scheme for a Markovian Source
    P. Jacquet
    W. Szpankowski
    J. Tang
    Algorithmica, 2001, 31 : 318 - 360