Sublinear Time Lempel-Ziv (LZ77) Factorization

被引:0
|
作者
Ellert, Jonas [1 ]
机构
[1] Tech Univ Dortmund, Dortmund, Germany
来源
STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2023 | 2023年 / 14240卷
关键词
Lempel-Ziv; LZ77; LZ-like; lossless compression; word-packing; sublinear time; string algorithms; approximation algorithms; DATA-COMPRESSION; ALGORITHM; COMPLEXITY;
D O I
10.1007/978-3-031-43980-3_14
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Lempel-Ziv (LZ77) factorization of a string is a widely-used algorithmic tool that plays a central role in data compression and indexing. For a length-n string over integer alphabet [0, sigma) with s = n(O(1)), and on a word RAM of width w = Theta(log n), it can be computed in O(n) time. However, the packed representation of the string occupies only Theta(n log sigma) bits or equivalently Theta(n / log(sigma) n) words of space, and hence we can hope for algorithms that run in O(n / log(sigma) n) time and words of space. Kempa showed how to compute the LZ77 factorization with overlaps in O(n / log(sigma) n+z log(11) n) time and O(n / log(sigma) n+z log(10) n) words of space, where z is the number of phrases in the LZ77 factorization (SODA 2019). We significantly improve this result by achieving O(n / log(sigma) n + z log(3+epsilon) z) time with overlaps, and O(n / log(sigma) n + z log(23/5+epsilon) z) without overlaps (for any constant epsilon is an element of R+). In both cases, we require only O(n / log(sigma) n) words of space. One ingredient of the solution is a novel approximation algorithm that computes an LZ-like parsing of at most 3z phrases in O(n / log(sigma) n) time and words of space. All algorithms are deterministic.
引用
收藏
页码:171 / 187
页数:17
相关论文
共 50 条
  • [21] Design of hardware accelerator for Lempel-Ziv 4 (LZ4) compression
    Lee, Sang Muk
    Jang, Ji Hoon
    Oh, Jung Hwan
    Kim, Ji Kwang
    Lee, Seung Eun
    IEICE ELECTRONICS EXPRESS, 2017, 14 (11):
  • [22] Application of Lempel-Ziv factorization to the approximation of grammar-based compression
    Rytter, W
    COMBINATORIAL PATTERN MATCHING, 2002, 2373 : 20 - 31
  • [23] Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections
    Hoobin, Christopher
    Puglisi, Simon J.
    Zobel, Justin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 5 (03): : 265 - 273
  • [24] Lempel-Ziv Factorization May Be Harder Than Computing All Runs
    Kosolobov, Dmitry
    32ND INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2015), 2015, 30 : 582 - 593
  • [25] Application of Lempel-Ziv factorization to the approximation of grammar-based compression
    Rytter, W
    THEORETICAL COMPUTER SCIENCE, 2003, 302 (1-3) : 211 - 222
  • [26] Pushdown and Lempel-Ziv depth
    Jordon, Liam
    Moser, Philippe
    INFORMATION AND COMPUTATION, 2023, 292
  • [27] On Lempel-Ziv complexity of sequences
    Doganaksoy, Ali
    Gologlu, Faruk
    SEQUENCES AND THEIR APPLICATIONS - SETA 2006, 2006, 4086 : 180 - 189
  • [28] On the Size of Lempel-Ziv and Lyndon Factorizations
    Karkkainen, Juha
    Kempa, Dominik
    Nakashima, Yuto
    Puglisi, Simon J.
    Shur, Arseny M.
    34TH SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2017), 2017, 66
  • [29] On the Approximation Ratio of Lempel-Ziv Parsing
    Gagie, Travis
    Navarro, Gonzalo
    Prezza, Nicola
    LATIN 2018: THEORETICAL INFORMATICS, 2018, 10807 : 490 - 503
  • [30] Generalized Lempel-Ziv compression for audio
    Kirovski, Darko
    Landau, Zeph
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (02): : 509 - 518