Compression of low entropy strings with Lempel-Ziv algorithms

被引:63
|
作者
Kosaraju, SR [1 ]
Manzini, G
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[2] Univ Piemonte Orientale, Dip Sci & Tecnol Avanzate, I-15100 Alessandria, Italy
[3] CNR, Ist Matemat Computaz, I-56100 Pisa, Italy
关键词
data compression; Lempel-Ziv parsing; empirical entropy;
D O I
10.1137/S0097539797331105
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We compare the compression ratio of the Lempel-Ziv algorithms with the empirical entropy of the input string. This approach makes it possible to analyze the performance of these algorithms without any assumption on the input and to obtain worst case results. We show that in this setting the standard definition of optimal compression algorithm is not satisfactory. In fact, although Lempel-Ziv algorithms are optimal according to the standard definition, there exist families of low entropy strings which are not compressed optimally. More precisely, the compression ratio achieved by LZ78 (resp., LZ77) can be much higher than the zeroth order entropy H-0 (resp., the first order entropy H-1). For this reason we introduce the concept of lambda-optimal algorithm. An algorithm is lambda-optimal with respect to H-k if, loosely speaking, its compression ratio is asymptotically bounded by lambda times the kth order empirical entropy H-k. We prove that LZ78 cannot be lambda-optimal with respect to any H-k with k greater than or equal to 0. Then, we describe a new algorithm which combines LZ78 with run length encoding (RLE) and is 3-optimal with respect to H-0. Finally, we prove that LZ77 is 8-optimal with respect to H-0, and that it cannot be lambda-optimal with respect to H-k for any k greater than or equal to 1.
引用
收藏
页码:893 / 911
页数:19
相关论文
共 50 条
  • [1] Compression of low entropy strings with Lempel-Ziv algorithms
    Kosaraju, SR
    Manzini, G
    COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS, 1998, : 107 - 121
  • [2] Lempel-Ziv dimension for Lempel-Ziv compression
    Lopez-Valdes, Maria
    MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE 2006, PROCEEDINGS, 2006, 4162 : 693 - 703
  • [3] Automata on Lempel-Ziv compressed strings
    Leiss, H
    de Rougemont, M
    COMPUTER SCIENCE LOGIC, PROCEEDINGS, 2003, 2803 : 384 - 396
  • [4] Generalized Lempel-Ziv compression for audio
    Kirovski, Darko
    Landau, Zeph
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (02): : 509 - 518
  • [5] String matching in Lempel-Ziv compressed strings
    Farach, M
    Thorup, M
    ALGORITHMICA, 1998, 20 (04) : 388 - 404
  • [6] ESTIMATION OF THE ENTROPY BY THE LEMPEL-ZIV METHOD
    HANSEL, G
    LECTURE NOTES IN COMPUTER SCIENCE, 1989, 377 : 51 - 65
  • [7] Generalized Lempel-Ziv compression for audio
    Kirovski, D
    Landau, Z
    2004 IEEE 6TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2004, : 127 - 130
  • [8] Lempel-Ziv compression of structured text
    Adiego, J
    Navarro, G
    de la Fuente, P
    DCC 2004: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2004, : 112 - 121
  • [9] Lempel-Ziv and Multiscale Lempel-Ziv Complexity in Depression
    Kalev, K.
    Bachmann, M.
    Orgo, L.
    Lass, J.
    Hinrikus, H.
    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 4158 - 4161
  • [10] Lempel-Ziv compression of highly structured documents
    Adiego, Joaquin
    Navarro, Gonzalo
    de la Fuente, Pablo
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2007, 58 (04): : 461 - 478