Generalization of Repetitiveness Measures for Two-Dimensional Strings

被引:0
|
作者
Carfagna, Lorenzo [1 ]
Manzini, Giovanni [1 ,2 ]
Romana, Giuseppe [2 ]
Sciortino, Marinella
Urbina, Cristian [3 ,4 ]
机构
[1] Univ Pisa, Dipartimento Informat, Pisa, Italy
[2] Univ Palermo, Dipartimento Matemat & Informat, Palermo, Italy
[3] Univ Chile, Dept Comp Sci, Santiago, Chile
[4] Ctr Biotechnol & Bioengn CeBiB, Santiago, Chile
来源
STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2024 | 2025年 / 14899卷
关键词
Two-dimensional strings; Repetitiveness measures; Text compression; COMPRESSION;
D O I
10.1007/978-3-031-72200-4_5
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Detecting and measuring repetitiveness of strings is a problem that has been extensively studied in data compression and text indexing. When the data are structured in a non-linear way, as in two-dimensional strings, inherent redundancy offers a rich source for compression, yet systematic studies on repetitiveness measures are still lacking. In this paper, we extend to two dimensions the measures delta and gamma, defined in terms of the submatrices of the input, as well as the measures g, grl, and b, which are based on copy-paste mechanisms. We study their properties and mutual relationships, and we show that the two classes of measures become incomparable when two-dimensional inputs are considered. We also compare our measures with the 2D Block Tree data structure [Brisaboa et al., Computer J., 2024], and provide some insights for the design of effective 2D compressors.
引用
收藏
页码:57 / 72
页数:16
相关论文
共 50 条