Learning-Based Sample Tuning for Approximate Query Processing in Interactive Data Exploration

被引：0

作者：

Zhang, Hanbing ^{[1
]}

Jing, Yinan ^{[1
]}

He, Zhenying ^{[1
]}

Zhang, Kai ^{[1
]}

Wang, X. Sean ^{[1
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai 200437, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2024年 / 36卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Measurement; Adaptation models; Costs; Tuners; Accuracy; Q-learning; Query processing; Optimization; Synthetic data; Approximate query processing; interactive data exploration; data analysis;

D O I：

10.1109/TKDE.2023.3341451

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For interactive data exploration, approximate query processing (AQP) is a useful approach that usually uses samples to provide a timely response for queries by trading query accuracy. Existing AQP systems often materialize samples in the memory for reuse to speed up query processing. How to tune the samples according to the workload is one of the key problems in AQP. However, since the data exploration workload is so complex that it cannot be accurately predicted, existing sample tuning approaches cannot adapt to the changing workload very well. To address this problem, this paper proposes a deep reinforcement learning-based sample tuner, RL-STuner. When tuning samples, RL-STuner considers the workload changes from a global perspective and uses a Deep Q-learning Network (DQN) model to select an optimal sample set that has the maximum utility for the current workload. In addition, this paper proposes a set of optimization mechanisms to reduce the sample tuning cost. Experimental results on both real-world and synthetic datasets show that RL-STuner outperforms the existing sample tuning approaches and achieves 1.6x-5.2x improvements on query accuracy with a low tuning cost.

引用

页码：6532 / 6546

页数：15

共 50 条

[41] Approximate Multipliers based on Inexact Adders Energy Efficient Data Processing
Osta, Mario
Ibrahim, Ali
Valle, Maurizio
Chible, Hussein
2017 FIRST NEW GENERATION OF CAS (NGCAS), 2017, : 125 - 128
[42] Improving 3D Metric GPR Imaging Using Automated Data Collection and Learning-Based Processing
Feng, Jinglun
Yang, Liang
Hoxha, Ejup
Xiao, Jizhong
IEEE SENSORS JOURNAL, 2023, 23 (05) : 4527 - 4539
[43] Cooperative Active Learning-Based Dual Control for Exploration and Exploitation in Autonomous Search
Li, Zhongguo
Chen, Wen-Hua
Yang, Jun
Liu, Cunjia
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (02) : 2221 - 2233
[44] Local mapping based modeling sensor network data algorithm for query processing
Song, Xin
Wang, Cuirong
Yuan, Ying
Journal of Networks, 2012, 7 (09) : 1369 - 1375
[45] Data mining based query processing using rough sets and genetic algorithms
Srinivasa, K. G.
Jagadish, M.
Venugopal, K. R.
Patnaik, L. M.
2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, VOLS 1 AND 2, 2007, : 275 - 282
[46] A Deep Learning-Based Pipeline for the Generation of Synthetic Tabular Data
Panfilo, Daniele
Boudewijn, Alexander
Saccani, Sebastiano
Coser, Andrea
Svara, Borut
Chauvenet, Carlo Rossi
Mami, Ciro Antonio
Medvet, Eric
IEEE ACCESS, 2023, 11 : 63306 - 63323
[47] Local histogram and discriminative learning-based hyperspectral data classification
Imani, Maryam
Ghassemian, Hassan
REMOTE SENSING LETTERS, 2017, 8 (01) : 86 - 95
[48] Automatic learning-based data optimization method for autonomous driving
Wang, Yang
Zhang, Jin
Chen, Yihao
Yuan, Hao
Wu, Cheng
DIGITAL SIGNAL PROCESSING, 2024, 148
[49] Learning-Based Design Space Exploration of Emerging 3D NoC Architectures
Kim, Ryan Gary
2019 TENTH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2019,
[50] CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application
Chen, Yu-Wen
Hung, Kuo-Hsuan
Li, You-Jin
Kang, Alexander Chao-Fu
Lai, Ya-Hsin
Liu, Kai-Chun
Fu, Szu-Wei
Wang, Syu-Siang
Tsao, Yu
IEEE ACCESS, 2022, 10 : 46082 - 46099

← 1 2 3 4 5 →