Materialized sample views for database approximation

被引:18
作者
Joshi, Shantanu [1 ]
Jermaine, Christopher [2 ]
机构
[1] Oracle Corp, Server Manageabil Grp, Redwood Shores, CA 94065 USA
[2] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
indexing methods; query processing; sampling;
D O I
10.1109/TKDE.2007.190664
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of creating a sample view of a database table. A sample view is an indexed materialized view that permits efficient sampling from an arbitrary range query over the view. Such "sample views" are very useful in applications that require random samples from a database: Approximate query processing, online aggregation, data mining, and randomized algorithms are a few examples. Our core technical contribution is a new file organization called the Appendability, Combinability, and Exponentiality (ACE) Tree that is suitable for organizing and indexing a sample view. One of the most important aspects of the ACE Tree is that it supports online random sampling from the view. That is, at all times, the set of records returned by the ACE Tree constitutes a statistically random sample of the database records satisfying the relational selection predicate over the view. Our paper presents experimental results that demonstrate the utility of the ACE Tree.
引用
收藏
页码:337 / 351
页数:15
相关论文
共 25 条
  • [1] [Anonymous], 1999, P 5 ACM SIGKDD INT C
  • [2] [Anonymous], P 1997 ACM SIGMOD IN
  • [3] ANTOSHENKOV G, 1992, PROC INT CONF VERY L, P375
  • [4] Bradley P. S., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P9
  • [5] Brown P.G., 2006, ICDE, P6
  • [6] Chaudhuri Surajit, 2004, P 2004 ACM SIGMOD IN, P287, DOI [10.1145/1007568.1007602, DOI 10.1145/1007568.1007602]
  • [7] Diwan AA, 1996, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P342
  • [8] Garcia-Molina H., 1999, DATABASE SYSTEM IMPL
  • [9] Guttman A., 1984, SIGMOD Record, V14, P47, DOI 10.1145/971697.602266
  • [10] Haas P. J., 2004, P ACM SIGMOD INT C M, P275, DOI DOI 10.1145/1007568.1007601