Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

被引:272
作者
Cormode, Graham [1 ]
Garofalakis, Minos [2 ]
Haas, Peter J. [3 ]
Jermaine, Chris [4 ]
机构
[1] AT&T Labs Res, 180 Pk Ave, Florham Pk, NJ 07932 USA
[2] Tech Univ Crete, Khania 73100, Greece
[3] IBM Almaden Res Ctr, San Jose, CA 95120 USA
[4] Rice Univ, Houston, TX 77005 USA
来源
FOUNDATIONS AND TRENDS IN DATABASES | 2011年 / 4卷 / 1-3期
基金
美国国家科学基金会;
关键词
D O I
10.1561/1900000004
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Methods for Approximate Query Processing (AQP) are essential for dealing with massive data. They are often the only means of providing interactive response times when exploring massive datasets, and are also needed to handle high speed data streams. These methods proceed by computing a lossy, compact synopsis of the data, and then executing the query of interest against the synopsis rather than the entire dataset. We describe basic principles and recent developments in AQP. We focus on four key synopses: random samples, histograms, wavelets, and sketches. We consider issues such as accuracy, space and time efficiency, optimality, practicality, range of applicability, error bounds on query answers, and incremental maintenance. We also discuss the tradeoffs between the different synopsis types.
引用
收藏
页码:1 / 294
页数:294
相关论文
共 293 条
  • [1] Abouzeid A, 2009, P VLDB, V2, P922
  • [2] Acharya S, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P275, DOI 10.1145/304181.304207
  • [3] Acharya S., 1999, INT C VER LARG DAT B
  • [4] ACHARYA S, 1999, ACM SIGMOD INT C MAN
  • [5] Aggarwal C., 2006, P 32 INT C VER LARG, P607
  • [6] Alon N., 1999, ACM PRINCIPLES DATAB
  • [7] Alon Noga, 1996, ACM S THEOR COMP
  • [8] Andoni A., 2010, CORR
  • [9] [Anonymous], 2003, P ACM SIGMOD 2003
  • [10] [Anonymous], 2006, P 32 INT C VER LARG