Approximate query processing using wavelets

被引:0
作者
Chakrabarti, K
Garofalakis, M
Rastogi, R
Shim, K
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Bell Labs, Murray Hill, NJ 07974 USA
[3] Korea Adv Inst Sci & Technol, Taejon 305701, South Korea
[4] AITrc, Taejon 305701, South Korea
关键词
query processing; data synopses; approximate query answers; wavelet decomposition;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent response-time requirements of today's decision support systems (DSS). Most work in this area, however, has so far been limited in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex queries entirely in the wavelet-coefficient domain. This guarantees extremely fast response times since our approximate query execution engine can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses in an I/O-efficient manner, Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate that our techniques: (1) provide approximate answers of better quality than either sampling or histograms; (2) offer query execution-time speedups of more than two orders of magnitude, and (3) guarantee extremely fast synopsis construction times that scale linearly with the size of the data.
引用
收藏
页码:199 / 223
页数:25
相关论文
共 50 条
  • [1] Approximate query processing using wavelets
    Chakrabarti K.
    Garofalakis M.
    Rastogi R.
    Shim K.
    The VLDB Journal, 2001, 10 (2) : 199 - 223
  • [2] Bounded Approximate Query Processing
    Li, Kaiyu
    Zhang, Yong
    Li, Guoliang
    Tao, Wenbo
    Yan, Ying
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (12) : 2262 - 2276
  • [3] Approximate Query Processing with Error Guarantees
    Ni, Tianjia
    Sugiura, Kento
    Ishikawa, Yoshiharu
    Lu, Kejing
    BIG-DATA-ANALYTICS IN ASTRONOMY, SCIENCE, AND ENGINEERING, BDA 2021, 2022, 13167 : 268 - 278
  • [4] Optimized stratified sampling for approximate query processing
    Chaudhuri, Surajit
    Das, Gautam
    Narasayya, Vivek
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2007, 32 (02):
  • [5] Approximate query processing model for mobile computing
    Madria, SK
    Mohania, M
    Roddick, JF
    INFORMATION ORGANIZATION AND DATABASES: FOUNDATIONS OF DATA ORGANIZATION, 2000, 579 : 207 - 219
  • [6] Learned Optimizer for Online Approximate Query Processing in Data Exploration
    Liu, Liyuan
    Zhang, Hanbing
    Jing, Yinan
    He, Zhenying
    Zhang, Kai
    Wang, X. Sean
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (08) : 3977 - 3991
  • [7] An Online Approximate Aggregation Query Processing Method Based on Hadoop
    Zhang, Zhiqiang
    Hu, Jianghua
    Xie, Xiaoqin
    Pan, Haiwei
    Feng, Xiaoning
    2016 IEEE 20TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2016, : 117 - 122
  • [8] Approximate query processing using multilayered data model to handle environmental constraints, privacy and avoiding inferences
    Narayanan, Muthukumar
    Madria, Sanjay Kumar
    Clair, Dan St.
    INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2007, 16 (02) : 177 - 228
  • [9] Improved Centralized XML Query Processing Using Distributed Query Workload
    Subramaniam, Samini
    Haw, Su-Cheng
    Soon, Lay-Ki
    IEEE ACCESS, 2021, 9 : 29127 - 29142
  • [10] Learning-Based Sample Tuning for Approximate Query Processing in Interactive Data Exploration
    Zhang, Hanbing
    Jing, Yinan
    He, Zhenying
    Zhang, Kai
    Wang, X. Sean
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (11) : 6532 - 6546