Efficient Keyword-Based Search for Top-K Cells in Text Cube

被引:7
作者
Ding, Bolin [1 ]
Zhao, Bo [1 ]
Lin, Cindy Xide [1 ]
Han, Jiawei [1 ]
Zhai, Chengxiang [1 ]
Srivastava, Ashok [2 ]
Oza, Nikunj C. [2 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] NASA, Ames Res Ctr, Intelligent Syst Div, Moffett Field, CA 94035 USA
基金
美国国家科学基金会;
关键词
Keyword search; multidimensional text data; data cube;
D O I
10.1109/TKDE.2011.34
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked structures (e.g., a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for scoring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches: inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches.
引用
收藏
页码:1795 / 1810
页数:16
相关论文
共 39 条
  • [1] DBXplorer: A system for keyword-based search over relational Databases
    Agrawal, S
    Chaudhuri, S
    Das, G
    [J]. 18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 5 - 16
  • [2] Amer-Yahia S, 2005, SIGMOD RECORD, V34, P71
  • [3] [Anonymous], 2007, Proceedings of the 2007 ACM SIGMOD international conference on Management of data
  • [4] [Anonymous], 2006, Proceedings of ACM Symposium on Principles of Database Systems (PODS)
  • [5] [Anonymous], 2010, P 19 INT C WORLD WID
  • [6] DBPubs: Multidimensional Exploration of Database Publications
    Baid, Akanksha
    Balmin, Andrey
    Hwang, Heasoo
    Nijkamp, Erik
    Rao, Jun
    Reinwald, Berthold
    Simitsis, Alkis
    Sismanis, Yannis
    van Hams, Frank
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02): : 1456 - 1459
  • [7] Ben-Yitzhak O., 2008, P INT C WEB SEARCH W, P33, DOI DOI 10.1145/1341531.1341539
  • [8] Keyword searching and browsing in Databases using BANKS
    Bhalotia, G
    Hulgeri, A
    Nakhe, C
    Chakrabarti, S
    Sudarshan, S
    [J]. 18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 431 - 440
  • [9] Chaudhuri S., 2005, CIDR, V05, P1
  • [10] Cormen T., 2001, Introduction to Algorithms