Scalable keyword search on large data streams

被引:0
作者
Lu Qin
Jeffrey Xu Yu
Lijun Chang
机构
[1] The Chinese University of Hong Kong,
来源
The VLDB Journal | 2011年 / 20卷
关键词
Keyword search; Relational databases; Data streams;
D O I
暂无
中图分类号
学科分类号
摘要
It is widely recognized that the integration of information retrieval (IR) and database (DB) techniques provides users with a broad range of high quality services. Along this direction, IR-styled m-keyword query processing over a relational database in an rdbms framework has been well studied. It finds all hidden interconnected tuple structures, for example connected trees that contain keywords and are interconnected by sequences of primary/foreign key relationships among tuples. A new challenging issue is how to monitor events that are implicitly interrelated over an open-ended relational data stream for a user-given m-keyword query. Such a relational data stream is a sequence of tuple insertion/deletion operations. The difficulty of the problem is related to the number of costly joins to be processed over time when tuples are inserted and/or deleted. Such cost is mainly affected by three parameters, namely, the number of keywords, the maximum size of interconnected tuple structures, and the complexity of the database schema when it is viewed as a schema graph. In this paper, we propose new approaches. First, we propose a novel algorithm to efficiently determine all the joins that need to be processed for answering an m-keyword query. Second, we propose a new demand-driven approach to process such a query over a high speed relational data stream. We show that we can achieve high efficiency by significantly reducing the number of intermediate results when processing joins over a relational data stream. The proposed new techniques allow us to achieve high scalability in terms of both query plan generation and query plan execution. We conducted extensive experimental studies using synthetic data and real data to simulate a relational data stream. Our approach significantly outperforms existing algorithms.
引用
收藏
页码:35 / 57
页数:22
相关论文
共 9 条
  • [1] Bernstein P.A.(1981)Using semi-joins to solve relational queries J. ACM 28 25-40
  • [2] Chiu D.-M.W.(2008)Keyword search on external memory data graphs PVLDB 1 1189-1204
  • [3] Dalvi B.B.(1972)The steiner problem in graphs Networks 1 195-207
  • [4] Kshirsagar M.(1999)The sift information dissemination system ACM Trans. Database Syst. 24 324-335
  • [5] Sudarshan S.(undefined)undefined undefined undefined undefined-undefined
  • [6] Dreyfus S.E.(undefined)undefined undefined undefined undefined-undefined
  • [7] Wagner R.A.(undefined)undefined undefined undefined undefined-undefined
  • [8] Yan T.W.(undefined)undefined undefined undefined undefined-undefined
  • [9] Garcia-Molina H.(undefined)undefined undefined undefined undefined-undefined