Sub-optimal data compression and the subset sum problem

被引:0
作者
Katti, Raj [1 ]
Srinivasan, Sudarshan [1 ]
机构
[1] N Dakota State Univ, Dept Elect & Comp Engn, Fargo, ND 58105 USA
基金
美国国家科学基金会;
关键词
Data compression; Shannon Fano Elias codes; Subset sum problem;
D O I
10.1016/j.aeue.2010.01.011
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We propose an efficient sub-optimal prefix code construction method for discrete sources with a finite alphabet and known probability mass function (pmf) It is well known that for a source that puts out symbols x with probability p, the optimal codeword lengths are l(1) = log(1/p(1)) However codeword lengths are integers and log(1/p(1)) is in general not an integer We propose a method to find binary codewords for x(1) whose lengths are initially assumed to be [log(1/p(i))]-1 Every prefix code must satisfy the Kraft s inequality but our initial codeword lengths may not satisfy the Kraft s inequality Using a simplified version of the subset sum problem we find a minimal set of codeword lengths that must be increased from [log(1/p(1))]-1 to [log(1/p(1))] so that Kraft s inequality is satisfied Even though this solution is not optimal it leads to average codeword lengths that are close to optimal and in some cases codeword lengths that are optimal Unlike the Huffman code our solution does not require the ordering of probabilities in the pmf The efficiency of our method can be further improved by reducing the size of the subset sum problem The example of English text shows that our method leads to a solution that is very close to the optimal solution The proposed method can also be used for encryption thereby accomplishing both compression and encryption simultaneously (C) 2010 Elsevier GmbH All rights reserved
引用
收藏
页码:53 / 61
页数:9
相关论文
共 11 条
[1]  
[Anonymous], 2006, Elements of Information Theory
[2]  
Blelloch GE, 2001, LECT NOTES COURSE AL
[3]   HARD KNAPSACK-PROBLEMS [J].
CHVATAL, V .
OPERATIONS RESEARCH, 1980, 28 (06) :1402-1411
[4]  
Cormack GV, 1952, P IRE, V40, P1098
[5]   On breaking a Huffman code [J].
Gillman, DW ;
Mohtashemi, M ;
Rivest, RL .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1996, 42 (03) :972-976
[6]   A METHOD FOR THE CONSTRUCTION OF MINIMUM-REDUNDANCY CODES [J].
HUFFMAN, DA .
PROCEEDINGS OF THE INSTITUTE OF RADIO ENGINEERS, 1952, 40 (09) :1098-1101
[7]  
MacKay D. J. C., 2003, Information Theory, Inference and Learning Algorithms
[8]  
Rissanen J, 1984, INFORM PROCESS LETT, V18, P159
[9]  
Sayood K., 2005, INTRO DATA COMPRESSI
[10]   A MATHEMATICAL THEORY OF COMMUNICATION [J].
SHANNON, CE .
BELL SYSTEM TECHNICAL JOURNAL, 1948, 27 (03) :379-423