Lempel-Ziv compression of structured text

被引:0
|
作者
Adiego, J [1 ]
Navarro, G [1 ]
de la Fuente, P [1 ]
机构
[1] Univ Valladolid, Dept Informat, Valladolid, Spain
关键词
Ziv-Lempel; XML data; text compression;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We describe a novel Lempel-Ziv approach suitable for compressing structured documents, called LZCS, which takes advantage of redundant information that can appear in the structure. The main idea is that frequently repeated subtrees may exist and these can be replaced by a backward reference to their first occurrence. The main advantage is that compressed documents generated by LZCS are easy to display, access at random, and navigate. In a second stage, processed documents can be further compressed using some semiadaptive technique, so that random access and navigability remain possible. LZCS is especially efficient to compress collections of highly structured data, such as XML forms, invoices, e-commerce and web-service exchange documents. The comparison against structure-based and standard compressors shows that LZCS is a competitive choice for this type of documents, while the others axe not well-suited to support navigation or random access.
引用
收藏
页码:112 / 121
页数:10
相关论文
共 50 条
  • [1] Lempel-Ziv dimension for Lempel-Ziv compression
    Lopez-Valdes, Maria
    MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE 2006, PROCEEDINGS, 2006, 4162 : 693 - 703
  • [2] Lempel-Ziv compression of highly structured documents
    Adiego, Joaquin
    Navarro, Gonzalo
    de la Fuente, Pablo
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2007, 58 (04): : 461 - 478
  • [3] Generalized Lempel-Ziv compression for audio
    Kirovski, Darko
    Landau, Zeph
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (02): : 509 - 518
  • [4] Decompressing Lempel-Ziv Compressed Text
    Bille, Philip
    Ettienne, Mikko Berggren
    Gagie, Travis
    Li Gortz, Inge
    Prezza, Nicola
    2020 DATA COMPRESSION CONFERENCE (DCC 2020), 2020, : 143 - 152
  • [5] Generalized Lempel-Ziv compression for audio
    Kirovski, D
    Landau, Z
    2004 IEEE 6TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2004, : 127 - 130
  • [6] Lempel-Ziv and Multiscale Lempel-Ziv Complexity in Depression
    Kalev, K.
    Bachmann, M.
    Orgo, L.
    Lass, J.
    Hinrikus, H.
    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 4158 - 4161
  • [7] A Lempel-Ziv text index on secondary storage
    Arroyuelo, Diego
    Navarro, Gonzalo
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2007, 4580 : 83 - +
  • [8] ON THE BIT-COMPLEXITY OF LEMPEL-ZIV COMPRESSION
    Ferragina, Paolo
    Nitto, Igor
    Venturini, Rossano
    SIAM JOURNAL ON COMPUTING, 2013, 42 (04) : 1521 - 1541
  • [9] Lossy lempel-ziv algorithm for image compression
    Cmojevic, V
    Senk, V
    Trpovski, Z
    TELSIKS 2003: 6TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS IN MODERN SATELLITE, CABLE AND BROADCASTING SERVICE, VOLS 1 AND 2, PROCEEDINGS OF PAPERS, 2003, : 522 - 525
  • [10] Secure Lempel-Ziv compression with embedded encryption
    Xie, DH
    Kuo, CCJ
    Security, Steganography, and Watermarking of Multimedia Contents VII, 2005, 5681 : 318 - 327