Revisiting dictionary-based compression

被引:26
作者
Skibinski, P
Grabowski, S
Deorowicz, S
机构
[1] Tech Univ Lodz, Dept Comp Engn, PL-90924 Lodz, Poland
[2] Univ Wroclaw, Inst Comp Sci, PL-51151 Wroclaw, Poland
[3] Silesian Tech Univ, Inst Comp Sci, PL-44100 Gliwice, Poland
关键词
lossless data compression; preprocessing; text compression; dictionary compression;
D O I
10.1002/spe.678
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
An attractive way to increase text compression is to replace words with references to a text dictionary given in advance. Although there exist a few works in this area, they do not fully exploit the compression possibilities or consider alternative preprocessing variants for various compressors in the latter phase. In this paper, we discuss several aspects of dictionary-based compression, including compact dictionary representation, and present a PPM/BWCA-oriented scheme, word replacing transformation, achieving compression ratios higher by 2-6% than the state-of-the-art StarNT (2003) text preprocessor, working at a greater speed. We also present an alternative scheme designed for LZ77 compressors, with the advantage over StarNT of reaching up to 14% in combination with gzip. Copyright (c) 2005 John Wiley & Sons, Ltd.
引用
收藏
页码:1455 / 1476
页数:22
相关论文
共 50 条
[21]   Phase unwinding for dictionary compression with multiple channel transmission in magnetic resonance fingerprinting [J].
Lattanzi, Riccardo ;
Zhang, Bei ;
Knoll, Florian ;
Asslander, Jakob ;
Cloos, Martijn A. .
MAGNETIC RESONANCE IMAGING, 2018, 49 :32-38
[22]   An In-Sight Into How Compression Dictionary Architecture Can Affect the Overall Performance in FPGAs [J].
Bartik, Matej ;
Benes, Tomas ;
Kubalik, Pavel .
IEEE ACCESS, 2020, 8 :183101-183116
[23]   ISSDC DIGRAM CODING BASED LOSSLESS DATA COMPRESSION ALGORITHM [J].
Mesut, Altan ;
Carus, Aydin .
COMPUTING AND INFORMATICS, 2010, 29 (05) :741-756
[24]   TRISTAN: Real-Time Analytics on Massive Time Series Using Sparse Dictionary Compression [J].
Marascu, Alice ;
Pompey, Pascal ;
Bouillet, Eric ;
Wurst, Michael ;
Verscheure, Olivier ;
Grund, Martin ;
Cudre-Mauroux, Philippe .
2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, :291-300
[25]   Greedy versus optimal analysis of bounded size dictionary compression and on-the-fly distributed computing [J].
De Agostino, Sergio .
DISCRETE APPLIED MATHEMATICS, 2024, 342 :200-206
[26]   Hardware Based Parallel Phrase Matching Engine in Dictionary Compressor [J].
Dong, Qian .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (12) :2968-2970
[27]   Practical Grammar Compression Based on Maximal Repeats [J].
Furuya, Isamu ;
Takagi, Takuya ;
Nakashima, Yuto ;
Inenaga, Shunsuke ;
Bannai, Hideo ;
Kida, Takuya .
ALGORITHMS, 2020, 13 (04)
[28]   Tag based Models for Arabic Text Compression [J].
Alkhazi, Ibrahim S. ;
Alghamdi, Mansoor A. ;
Teahan, William J. .
PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2017, :697-705
[29]   Conjugation-based compression for Hebrew texts [J].
Hebrew University of Jerusalem, Jerusalem, Israel ;
不详 .
ACM Trans. Asian Lang. Inf. Process., 2007, 1
[30]   Trigram-Based Vietnamese Text Compression [J].
Nguyen, Vu H. ;
Nguyen, Hien T. ;
Duong, Hieu N. ;
Snasel, Vaclav .
RECENT DEVELOPMENTS IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2016, 642 :297-307