Scalable keyword search on large data streams

被引:14
作者
Qin, Lu [1 ]
Yu, Jeffrey Xu [1 ]
Chang, Lijun [1 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
关键词
Keyword search; Relational databases; Data streams;
D O I
10.1007/s00778-010-0190-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
It is widely recognized that the integration of information retrieval (IR) and database (DB) techniques provides users with a broad range of high quality services. Along this direction, IR-styled m-keyword query processing over a relational database in an rdbms framework has been well studied. It finds all hidden interconnected tuple structures, for example connected trees that contain keywords and are interconnected by sequences of primary/foreign key relationships among tuples. A new challenging issue is how to monitor events that are implicitly interrelated over an open-ended relational data stream for a user-given m-keyword query. Such a relational data stream is a sequence of tuple insertion/deletion operations. The difficulty of the problem is related to the number of costly joins to be processed over time when tuples are inserted and/or deleted. Such cost is mainly affected by three parameters, namely, the number of keywords, the maximum size of interconnected tuple structures, and the complexity of the database schema when it is viewed as a schema graph. In this paper, we propose new approaches. First, we propose a novel algorithm to efficiently determine all the joins that need to be processed for answering an m-keyword query. Second, we propose a new demand-driven approach to process such a query over a high speed relational data stream. We show that we can achieve high efficiency by significantly reducing the number of intermediate results when processing joins over a relational data stream. The proposed new techniques allow us to achieve high scalability in terms of both query plan generation and query plan execution. We conducted extensive experimental studies using synthetic data and real data to simulate a relational data stream. Our approach significantly outperforms existing algorithms.
引用
收藏
页码:35 / 57
页数:23
相关论文
共 31 条
  • [1] Agrawal S., 2002, P ICDE 02
  • [2] AYAD A, 2004, P ACM SIGMOD INT C M, P419
  • [3] AYAD A, 2006, P ICDE ATL GEORG, P142
  • [4] Balmin A., 2004, P VLDB 04
  • [5] USING SEMI-JOINS TO SOLVE RELATIONAL QUERIES
    BERNSTEIN, PA
    CHIU, DMW
    [J]. JOURNAL OF THE ACM, 1981, 28 (01) : 25 - 40
  • [6] Bhalotia G., 2002, P ICDE 02
  • [7] Chowdhury, 2006, P SIGMOD 06
  • [8] DALVI BB, 2008, P VLDB ENDOWMENT, V1, P1189
  • [9] DAS A, 2003, P 2003 ACM SIGMOD IN, P40
  • [10] Ding B., 2007, P ICDE 07