Efficient memory representation of XML document trees

被引:60
作者
Busatto, Giorgio [2 ]
Lohrey, Markus [1 ]
Maneth, Sebastian [3 ]
机构
[1] Univ Leipzig, Inst Informat, D-04103 Leipzig, Germany
[2] Carl von Ossietzky Univ Oldenburg, Dept Informat, D-2900 Oldenburg, Germany
[3] Univ New S Wales, Sydney, NSW, Australia
关键词
tree grammar; compression; in-memory XML representation;
D O I
10.1016/j.is.2008.01.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. In this paper, a technique is presented that allows to represent the tree structure of an XML document in an efficient way. The representation exploits the high regularity in XML documents by compressing their tree structure; the latter means to detect and remove repetitions of tree patterns. Formally, context-free tree grammars that generate only a single tree are used for tree compression. The functionality of basic tree operations, like traversal along edges, is preserved under this compressed representation. This allows to directly execute queries (and in particular, bulk operations) without prior decompression. The complexity of certain computational problems like validation against XML types or testing equality is investigated for compressed input trees. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:456 / 474
页数:19
相关论文
共 35 条
[1]  
ARION A, 2003, PROC VLDB ENDOW, P1065
[2]  
Bar-Hillel Y., 1961, Z. Phonetik Sprachwiss. Kommunikat., V14, P143, DOI 10.1524/stuf.1961.14.14.143
[3]  
Buneman P, 2005, PROC INT CONF DATA, P261
[4]  
Buneman Peter, 2003, 29 INT C VER LARG DA, P141, DOI DOI 10.1016/B978-012722442-8/50021-5
[5]   The smallest grammar problem [J].
Charikar, M ;
Lehman, E ;
Liu, D ;
Panigrahy, R ;
Prabhakaran, M ;
Sahai, A ;
Shelat, A .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2005, 51 (07) :2554-2576
[6]  
CHEN S, 1996, P DCC 96
[7]  
CHENEY JR, 2004, COMMUNICATION
[8]  
CHENEY JR, 1998, THESIS CARNEGIE MELL
[9]  
Cheng J, 2004, LECT NOTES COMPUT SC, V2992, P219
[10]  
FERNANDEZ MF, 2003, P 29 INT C VER LARG, P1077