Efficient evaluation of generalized tree-pattern queries on XML streams

被引:0
|
作者
Xiaoying Wu
Dimitri Theodoratos
Calisto Zuzarte
机构
[1] New Jersey Institute of Technology,
[2] IBM Canada Ltd.,undefined
来源
The VLDB Journal | 2010年 / 19卷
关键词
XML; XPath evaluation; XML streams;
D O I
暂无
中图分类号
学科分类号
摘要
The streaming evaluation is a popular way of evaluating queries on XML documents. Besides its many advantages, it is also the only option for a number of important XML applications. Unfortunately, existing algorithms focus almost exclusively on tree-pattern queries (TPQs). Requirements for flexible querying of XML data have motivated recently the introduction of query languages that are more general and flexible than TPQs. These languages are not supported by existing algorithms. In this paper, we consider a partial tree-pattern query (PTPQ) language which generalizes and strictly contains TPQs. PTPQs can express a fragment of XPath which comprises reverse axes and the node identity equality (is) operator, in addition to forward axes, wildcards and predicates. They constitute an important subclass of XPath, which is very useful in practice. Unfortunately, previous streaming algorithms for TPQs cannot be applied to PTPQs. PTPQs can be represented as dags enhanced with constraints. We explore this representation to design an original polynomial time streaming algorithm for PTPQs. Our algorithm aggressively filters incoming data that is irrelevant to the query and wisely avoids processing redundant query matches (i.e., matches of the query dag that do not contribute to new solutions). Our algorithm is the first one to support the streaming evaluation of such a broad fragment of XPath. We provide an analysis of it, and conduct an extensive experimental evaluation of its performance and scalability. Compared to the only known streaming algorithm that supports TPQs extended with reverse axes, our algorithm performs better by orders of magnitude while consuming a much smaller fraction of memory space. Current streaming applications have stringent requirements on query response time and memory consumption because of the large (possibly unbounded) size of data they handle. In order to keep memory usage and CPU consumption low for the PTPQ streaming evaluation, we design another streaming algorithm called Eager PSX for PTPQs. Its key feature is that it applies an eager evaluation strategy to quickly determine when node matches should be returned as solutions to the user and also to proactively detect redundant matches. We theoretically analyze Eager PSX, and experimentally test its time and space performance and scalability. We compare it with PSX. Our results show that Eager PSX not only achieves better space performance without compromising time performance, but it also greatly improves query response time for both simple and complex queries, in many cases, by orders of magnitude.
引用
收藏
页码:661 / 686
页数:25
相关论文
共 50 条
  • [21] Efficient evaluation of XML twig queries
    Chang, YH
    Lee, CT
    Luo, CC
    ADVANCED WEB AND NETWORK TECHNOLOGIES, AND APPLICATIONS, PROCEEDINGS, 2006, 3842 : 48 - 57
  • [22] ViewJoin: Efficient View-based Evaluation of Tree Pattern Queries
    Chen, Ding
    Chan, Chee-Yong
    26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 816 - 827
  • [23] Processing and Evaluating Partial Tree Pattern Queries on XML Data
    Wu, Xiaoying
    Souldatos, Stefanos
    Theodoratos, Dimitri
    Dalamagas, Theodore
    Vassiliou, Yannis
    Sellis, Timos
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (12) : 2244 - 2259
  • [24] Minimization of XML Tree Pattern Queries in the Presence of Integrity Constraints
    Chen, Yangjun
    Che, Dunren
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2006, 10 (05) : 744 - 751
  • [25] Efficient evaluation of XML path queries with automata
    Sun, B
    Lv, JH
    Wang, GR
    Yu, G
    Zhou, B
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2003, 2762 : 116 - 127
  • [26] Answering ordered tree pattern queries over fuzzy XML data
    Jian Liu
    Z. M. Ma
    Xue Feng
    Knowledge and Information Systems, 2015, 43 : 473 - 495
  • [27] Answering ordered tree pattern queries over fuzzy XML data
    Liu, Jian
    Ma, Z. M.
    Feng, Xue
    KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 43 (02) : 473 - 495
  • [28] Efficient evaluation of XML middle-ware queries
    Fernandez, M
    Morishima, A
    Suciu, D
    SIGMOD RECORD, 2001, 30 (02) : 103 - 114
  • [29] Efficient evaluation of multiple queries on streamed XML fragments
    Huo, Huan
    Zhou, Rui
    Wang, Guoren
    Hui, Xiaoyun
    Xiao, Chuan
    Yu, Yongqian
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2006, 4016 : 61 - 72
  • [30] EFFICIENT EVALUATION OF XML TWIG QUERIES WITH KEYWORD CONSTRAINTS
    Chang, Ya-Hui
    Luo, Chieh-Chang
    Huang, Chih-Chung
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2009, 32 (04) : 469 - 480