Efficient evaluation of linear path expressions on large-scale heterogeneous XML documents using information retrieval techniques

被引:5
作者
Park, YH
Whang, KY
Lee, BS
Han, WS
机构
[1] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon 305701, South Korea
[2] Korea Adv Inst Sci & Technol, AITrc, Taejon 305701, South Korea
[3] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
[4] Kyungpook Natl Univ, Dept Comp Engn, Taegu 702701, South Korea
关键词
XML; inverted indexes; partial match queries; information retrieval;
D O I
10.1016/j.jss.2005.05.009
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We propose XIR-Linear, a method for efficiently evaluating linear path expressions (LPEs) on large-scale heterogeneous XML documents using information retrieval (IR) techniques. LPEs are the primary form of XPath queries, and their evaluation techniques have been researched actively. XPath queries in their general form are partial match queries, and these queries are particularly useful for searching documents of heterogeneous schemas. Thus, XIR-Linear is geared for partial match queries expressed as LPEs. XIR-Linear has its basis on existing methods using relational tables (e.g., XRel, XParent), and drastically improves their efficiency using the inverted index technique. Specifically, it indexes the labels in label paths (i.e., sequences of node labels) like keywords in texts, and finds the label paths matching the LPE far more efficiently than string match used in the existing methods. We demonstrate the efficiency of XIR-Linear by comparing it with XRel and XParent using XML documents crawled from the Internet. The results show that XIR-Linear outperforms XRel and XParent by an order of magnitude with the performance gap widening as database size grows. (C) 2005 Elsevier Inc. All rights reserved.
引用
收藏
页码:180 / 190
页数:11
相关论文
共 32 条
[1]  
Aboulnaga A., 2001, Proceedings of the 27th International Conference on Very Large Data Bases, P591
[2]   Structural joins: A primitive for efficient XML query pattern matching [J].
Al-Khalifa, S ;
Jagadish, HV ;
Koudas, N ;
Patel, JM ;
Srivastava, D ;
Wu, YQ .
18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, :141-152
[3]  
Altinel Mehmet., 2000, VLDB, P53
[4]  
Amer-Yahia S, 2001, SIGMOD REC, V30, P497, DOI 10.1145/376284.375730
[5]  
[Anonymous], P VLDB
[6]  
[Anonymous], P INT C 29 VLDB BERL
[7]  
Bremer J.-M., 2002, P 5 INT WORKSH WEB D, P1
[8]  
Bruno N., 2002, P 2002 ACM SIGMOD IN, P310
[9]  
Clark J., 1999, XML PATH LANGUAGE XP
[10]  
Cooper B. F., 2001, Proceedings of the 27th International Conference on Very Large Data Bases, P341