An efficient inverted index technique for XML documents using RDBMS

被引:17
作者
Seo, C
Lee, SW
Kim, HJ
机构
[1] Seoul Natl Univ, Sch Engn & Comp Sci, Gwanak Ku, Seoul 151742, South Korea
[2] Sungkyunkwan Univ, Sch Informat & Commun Engn, Suwon, South Korea
关键词
inverted index; XML; query processing;
D O I
10.1016/S0950-5849(02)00157-X
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The inverted index is widely used in the existing information retrieval field. In order to support containment queries for structured documents such as XML, it needs to be extended. Previous work suggested an extension in storing the inverted index for XML documents and processing containment queries, and compared two implementation options: using an RDBMS and using an Information Retrieval (IR) engine. However, the previous work has two drawbacks in extending the inverted index. One is that the RDBMS implementation is generally much worse in the performance than the IR engine implementation. The other is that when a containment query is processed in an RDBMS, the number of join operations increases in proportion to the number of containment relationships in the query and a join operation always occurs between large relations. In order to solve these problems, we propose in this paper a novel approach to extend the inverted index for containment query processing, and show its effectiveness through experimental results. In particular, our performance study shows that (1) our RDBMS approach almost always outperforms the previous RDBMS and IR approaches, (2) our RDBMS approach is not far behind our IR approach with respect to performance, and (3) our approach is scalable to the number of containment relationships in queries. Therefore, our results suggest that, without having to make any modifications on the RDBMS engine, a native implementation using an RDBMS can support containment queries as efficiently as an IR implementation. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:11 / 22
页数:12
相关论文
共 42 条
  • [1] ARNOLDMOORE T, 1995, P AUSTR DAT C
  • [2] BAEZAYATES R, 1996, ACM SIGMOD RECORD, V25, P67
  • [3] BAEZAYATES RA, 1999, MODERN INFORMATION R
  • [4] BAYER R, 2001, XML DATABASE MODELLI
  • [5] BLAKE GE, 1994, P INT C APPL DAT
  • [6] BRAY T, 1998, EXTENSIBLE MARKUP LA
  • [7] Chamberlin D., 2001, XQUERY QUERY LANGUAG
  • [8] Clark J., 1999, XML PATH LANGUAGE XP
  • [9] AN ALGEBRA FOR STRUCTURED TEXT SEARCH AND A FRAMEWORK FOR ITS IMPLEMENTATION
    CLARKE, CLA
    CORMACK, GV
    BURKOWSKI, FJ
    [J]. COMPUTER JOURNAL, 1995, 38 (01) : 43 - 56
  • [10] CLARKE CLA, 1995, 4 ANN S DOC AN INF