Quasi-distinct parsing and optimal compression methods

被引:4
作者
Amir, Amihood [2 ,3 ]
Aumann, Yonatan [2 ]
Levy, Avivit [1 ,4 ]
Roshko, Yuri [1 ]
机构
[1] Shenkar Coll, IL-52526 Ramat Gan, Israel
[2] Bar Ilan Univ, Dept Comp Sci, IL-52900 Ramat Gan, Israel
[3] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[4] Univ Haifa, CRI, IL-31905 Haifa, Israel
基金
以色列科学基金会; 美国国家科学基金会;
关键词
Optimal compression; Parsing; Arithmetic progressions; INDIVIDUAL SEQUENCES; ALGORITHM;
D O I
10.1016/j.tcs.2011.11.022
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, the optimality proof of Ziv-Lempel coding is re-studied, and a more general compression optimality theorem is derived. In particular, the property of quasi-distinct parsing is defined. This property allows infinitely many repetitions of phrases in the parsing as long as the total number of repetitions is 0(n/ log n), where n is length of the parsed string. The quasi-distinct parsing property is weaker than distinct parsing used in the original proof which does not allow repetitions of phrases in the parsing. Yet we show that the theorem holds with this weaker property as well. This provides a better understanding of the optimality proof of Ziv-Lempel coding, together with a new tool for proving optimality of other compression schemes which is applicable for a much wider family of codes. To demonstrate the possible use of this generalization, a new coding method - the Arithmetic Progression Tree coding (APT) - is presented. This new coding method is based on a principle that is very different from Ziv-Lempel's coding. Nevertheless, the APT coding is analyzed in this paper and using the generalized theorem shown to be asymptotically optimal up to a constant factor,(1) if the APT quasi-distinctness hypothesis holds. An empirical evidence that this hypothesis holds is also given. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 14
页数:14
相关论文
共 16 条
[1]  
Amir A, 2008, FUND INFORM, V84, P1
[2]  
Bell T.C., 1990, Text Compression
[3]  
Burrows M, 1994, BLOCK SORTING LOSSLE
[4]  
Cover T.M., 2006, ELEMENTS INFORM THEO, V2nd ed
[5]  
JACOBSON G, 1989, P 30 ANN S FDN COMP, P549, DOI DOI 10.1109/SFCS.1989.63533
[6]   Grammar-based codes: A new class of universal lossless source codes [J].
Kieffer, JC ;
Yang, EH .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2000, 46 (03) :737-754
[7]   COMPLEXITY OF FINITE SEQUENCES [J].
LEMPEL, A ;
ZIV, J .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1976, 22 (01) :75-81
[8]  
LOUCHARD G, 1995, DAT COMPR C DCC, P262
[9]   An analysis of the Burrows-Wheeler Transform [J].
Manzini, G .
JOURNAL OF THE ACM, 2001, 48 (03) :407-430
[10]  
Nevill-Manning C. G., 1994, Proceedings DCC '94. Data Compression Conference (Cat. No.94TH0626-2), P244, DOI 10.1109/DCC.1994.305932