Sublinear Time Lempel-Ziv (LZ77) Factorization

被引:0
|
作者
Ellert, Jonas [1 ]
机构
[1] Tech Univ Dortmund, Dortmund, Germany
来源
STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2023 | 2023年 / 14240卷
关键词
Lempel-Ziv; LZ77; LZ-like; lossless compression; word-packing; sublinear time; string algorithms; approximation algorithms; DATA-COMPRESSION; ALGORITHM; COMPLEXITY;
D O I
10.1007/978-3-031-43980-3_14
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Lempel-Ziv (LZ77) factorization of a string is a widely-used algorithmic tool that plays a central role in data compression and indexing. For a length-n string over integer alphabet [0, sigma) with s = n(O(1)), and on a word RAM of width w = Theta(log n), it can be computed in O(n) time. However, the packed representation of the string occupies only Theta(n log sigma) bits or equivalently Theta(n / log(sigma) n) words of space, and hence we can hope for algorithms that run in O(n / log(sigma) n) time and words of space. Kempa showed how to compute the LZ77 factorization with overlaps in O(n / log(sigma) n+z log(11) n) time and O(n / log(sigma) n+z log(10) n) words of space, where z is the number of phrases in the LZ77 factorization (SODA 2019). We significantly improve this result by achieving O(n / log(sigma) n + z log(3+epsilon) z) time with overlaps, and O(n / log(sigma) n + z log(23/5+epsilon) z) without overlaps (for any constant epsilon is an element of R+). In both cases, we require only O(n / log(sigma) n) words of space. One ingredient of the solution is a novel approximation algorithm that computes an LZ-like parsing of at most 3z phrases in O(n / log(sigma) n) time and words of space. All algorithms are deterministic.
引用
收藏
页码:171 / 187
页数:17
相关论文
共 50 条
  • [1] A Comparison of Index-Based Lempel-Ziv LZ77 Factorization Algorithms
    Al-Hafeedh, Anisa
    Crochemore, Maxime
    Ilie, Lucian
    Kopylova, Evguenia
    Smyth, W. F.
    Tischler, German
    Yusufu, Munina
    ACM COMPUTING SURVEYS, 2012, 45 (01)
  • [2] Lempel-Ziv Factorization Revisited
    Ohlebusch, Enno
    Gog, Simon
    COMBINATORIAL PATTERN MATCHING, 22ND ANNUAL SYMPOSIUM, CPM 2011, 2011, 6661 : 15 - 26
  • [3] Lempel-Ziv Factorization Using Less Time & Space
    Chen, Gang
    Puglisi, Simon J.
    Smyth, W. F.
    MATHEMATICS IN COMPUTER SCIENCE, 2008, 1 (04) : 605 - 623
  • [4] Computing Lempel-Ziv Factorization Online
    Starikovskaya, Tatiana
    MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE 2012, 2012, 7464 : 789 - 799
  • [5] Practical Parallel Lempel-Ziv Factorization
    Shun, Julian
    Zhao, Fuyao
    2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 123 - 132
  • [6] Linear Time Lempel-Ziv Factorization: Simple, Fast, Small
    Karkkainen, Juha
    Kempa, Dominik
    Puglisi, Simon J.
    COMBINATORIAL PATTERN MATCHING, 2013, 7922 : 189 - 200
  • [7] Lempel-Ziv dimension for Lempel-Ziv compression
    Lopez-Valdes, Maria
    MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE 2006, PROCEEDINGS, 2006, 4162 : 693 - 703
  • [8] Reversed Lempel-Ziv Factorization with Suffix Trees
    Koppl, Dominik
    ALGORITHMS, 2021, 14 (06)
  • [9] Enumerative Implementation of Lempel-Ziv 77 Algorithm
    Kawabata, Tsutomu
    2008 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS, VOLS 1-6, 2008, : 990 - 994
  • [10] Computing Reversed Lempel-Ziv Factorization Online
    Sugimoto, Shiho
    Tomohiro, I
    Inenaga, Shunsuke
    Bannai, Hideo
    Takeda, Masayuki
    PROCEEDINGS OF THE PRAGUE STRINGOLOGY CONFERENCE 2013, 2013, : 107 - 118