Compressing and Indexing Labeled Trees, with Applications

被引:92
作者
Ferragina, Paolo [1 ]
Luccio, Fabrizio [1 ]
Manzini, Giovanni [2 ]
Muthukrishnan, S. [3 ]
机构
[1] Univ Pisa, Dipartimento Informat, I-56127 Pisa, Italy
[2] Univ Piemonte Orientale, Dipartimento Informat, I-15100 Alessandria, Italy
[3] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08855 USA
关键词
Algorithms; Design; Experimentation; Theory; Burrows-Wheeler transform; labeled tree compression; labeled tree indexing; path searching; tree navigation; XML compression; XML indexing; SUCCINCT REPRESENTATIONS; SPACE REQUIREMENT; XML;
D O I
10.1145/1613676.1613680
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Consider an ordered, static tree T where each node has a label from alphabet Sigma. Tree T may be of arbitrary degree and shape. Our goal is designing a compressed storage scheme of T that supports basic navigational operations among the immediate neighbors of a node (i.e. parent, ith child, or any child with some label, ...) as well as more sophisticated path-based search operations over its labeled structure. We present a novel approach to this problem by designing what we call the XBW-transform of the tree in the spirit of the well-knownBurrows-Wheeler transform for strings [1994]. The XBW-transform uses path-sorting to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. For the first time, by using the properties of the XBW-transform, our compressed indexes go beyond the information-theoretic lower bound, and support navigational and path-search operations over labeled trees within (near-) optimal time bounds and entropy-bounded space. Our XBW-transform is simple and likely to spur new results in the theory of tree compression and indexing, as well as interesting application contexts. As an example, we use the XBW-transform to design and implement a compressed index for XML documents whose compression ratio is significantly better than the one achievable by state-of-the-art tools, and its query time performance is order of magnitudes faster.
引用
收藏
页数:33
相关论文
共 56 条
[1]  
Adiego J, 2004, IEEE DATA COMPR CONF, P522
[2]  
ARION A, 2003, PROC VLDB ENDOW, P1065
[3]  
Arroyuelo D, 2006, LECT NOTES COMPUT SC, V4009, P318
[4]  
BARBAY J, 2008, P 18 ACM SIAM S DISC
[5]   Adaptive searching in succinctly encoded binary relations and tree-structured documents [J].
Barbay, Jeremy ;
Golynski, Alexander ;
Munro, J. Ian ;
Rao, S. Srinivasa .
THEORETICAL COMPUTER SCIENCE, 2007, 387 (03) :284-297
[6]   Representing trees of higher degree [J].
Benoit, D ;
Demaine, ED ;
Munro, JI ;
Raman, R ;
Raman, V ;
Rao, SS .
ALGORITHMICA, 2005, 43 (04) :275-292
[7]  
Burrows M, 1994, BLOCK SORTING LOSSLE
[8]   Efficient memory representation of XML document trees [J].
Busatto, Giorgio ;
Lohrey, Markus ;
Maneth, Sebastian .
INFORMATION SYSTEMS, 2008, 33 (4-5) :456-474
[9]   XML document indexes: A classification [J].
Catania, B ;
Maddalena, A ;
Vakali, A .
IEEE INTERNET COMPUTING, 2005, 9 (05) :64-71
[10]  
Cheney J, 2001, IEEE DATA COMPR CONF, P163, DOI 10.1109/DCC.2001.917147