Learning-Based Sample Tuning for Approximate Query Processing in Interactive Data Exploration

被引：0

作者：

Zhang, Hanbing ^{[1
]}

Jing, Yinan ^{[1
]}

He, Zhenying ^{[1
]}

Zhang, Kai ^{[1
]}

Wang, X. Sean ^{[1
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai 200437, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2024年 / 36卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Measurement; Adaptation models; Costs; Tuners; Accuracy; Q-learning; Query processing; Optimization; Synthetic data; Approximate query processing; interactive data exploration; data analysis;

D O I：

10.1109/TKDE.2023.3341451

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For interactive data exploration, approximate query processing (AQP) is a useful approach that usually uses samples to provide a timely response for queries by trading query accuracy. Existing AQP systems often materialize samples in the memory for reuse to speed up query processing. How to tune the samples according to the workload is one of the key problems in AQP. However, since the data exploration workload is so complex that it cannot be accurately predicted, existing sample tuning approaches cannot adapt to the changing workload very well. To address this problem, this paper proposes a deep reinforcement learning-based sample tuner, RL-STuner. When tuning samples, RL-STuner considers the workload changes from a global perspective and uses a Deep Q-learning Network (DQN) model to select an optimal sample set that has the maximum utility for the current workload. In addition, this paper proposes a set of optimization mechanisms to reduce the sample tuning cost. Experimental results on both real-world and synthetic datasets show that RL-STuner outperforms the existing sample tuning approaches and achieves 1.6x-5.2x improvements on query accuracy with a low tuning cost.

引用

页码：6532 / 6546

页数：15

共 50 条

[31] Exploration and Evaluation of Machine Learning-Based Models for Predicting Enzymatic Reactions
Watanabe, Naoki
Murata, Masahiro
Ogawa, Teppei
Vavricka, Christopher J.
Kondo, Akihiko
Ogino, Chiaki
Araki, Michihiro
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2020, 60 (03) : 1833 - 1843
[32] Machine Learning-Based Boosted Regression Ensemble Combined with Hyperparameter Tuning for Optimal Adaptive Learning
Isabona, Joseph
Imoize, Agbotiname Lucky
Kim, Yongsung
SENSORS, 2022, 22 (10)
[33] Big Data architecture for intelligent maintenance: a focus on query processing and machine learning algorithms
Claude Lehmann
Lilach Goren Huber
Thomas Horisberger
Georg Scheiba
Ana Claudia Sima
Kurt Stockinger
Journal of Big Data, 7
[34] Learning-Based Detection of Harmful Data in Mobile Devices
Jang, Seok-Woo
Kim, Gye-Young
MOBILE INFORMATION SYSTEMS, 2016, 2016
[35] Learning nodes: machine learning-based energy and data management strategy
Kim, Yunmin
Lee, Tae-Jin
EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2021, 2021 (01)
[36] Learning-based tuning of supervisory model predictive control for drinking water networks
Grosso, J. M.
Ocampo-Martinez, C.
Puig, V.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (07) : 1741 - 1750
[37] Learning nodes: machine learning-based energy and data management strategy
Yunmin Kim
Tae-Jin Lee
EURASIP Journal on Wireless Communications and Networking, 2021
[38] Big Data architecture for intelligent maintenance: a focus on query processing and machine learning algorithms
Lehmann, Claude
Huber, Lilach Goren
Horisberger, Thomas
Scheiba, Georg
Sima, Ana Claudia
Stockinger, Kurt
JOURNAL OF BIG DATA, 2020, 7 (01)
[39] A new approximate query engine based on intelligent capture and fast transformations of granulated data summaries
Slezak, Dominik
Glick, Rick
Betlinski, Pawel
Synak, Piotr
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2018, 50 (02) : 385 - 414
[40] A new approximate query engine based on intelligent capture and fast transformations of granulated data summaries
Dominik Ślęzak
Rick Glick
Paweł Betliński
Piotr Synak
Journal of Intelligent Information Systems, 2018, 50 : 385 - 414

← 1 2 3 4 5 →