Coral: federated query join order optimization based on deep reinforcement learning

被引:2
作者
Gu, Rong [1 ]
Zhang, Yi [1 ]
Yin, Liangliang [1 ]
Song, Lingyi [1 ]
Huang, Wenjie [1 ]
Yuan, Chunfeng [1 ]
Wang, Zhaokang [1 ,2 ]
Zhu, Guanghui [1 ]
Huang, Yihua [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
来源
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS | 2023年 / 26卷 / 05期
基金
美国国家科学基金会;
关键词
Join order; Query optimization; Federated query; Deep reinforcement learning;
D O I
10.1007/s11280-023-01156-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rise of diversified data engines has created the need for federated queries. A federated query can take a query and provide data analysis based on data from various data engines. Since the query data originates from multiple data engines, federated queries usually rely on join operation and data migration to complete the query and take a long time. The challenges of optimizing federated queries lie on join order selection and data migration coordination. However, enumerating all join orders is impractical because the set of join orders grows exponentially with the number of relations to be joined. To improve the performance of federated queries, we present a deep reinforcement learning-based approach on optimizing join order and join engine selection for federated queries and design an deep Q-networkbased (DQN-based) optimizer. The DQN-based optimizer can generate join search policies that optimize the join order selection for datasets with a given cost model. Based on the DQN-based optimizer, we implement a federated query system Coral which can provide optimization for join order selection of federated queries. With the optimized join order, Coral can transform a federated query into a set of subqueries which will be assigned to and executed on different data engines. We also propose a subquery cache optimization to optimize data migration during the query execution. The extensive experimental evaluation demonstrates that Coral can significantly reduce the query latency of federated queries and achieve a speedup of up to 5.03x compared to the cutting-edge federated query systems.
引用
收藏
页码:3093 / 3118
页数:26
相关论文
共 31 条
[1]  
[Anonymous], 2021, TPC H HOMEPAGE
[2]  
[Anonymous], 2021, DZONE GUIDE DATA PER
[3]  
[Anonymous], 2021, Apache Spark
[4]   Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources [J].
Begoli, Edmon ;
Camacho-Rodriguez, Jesus ;
Hyde, Julian ;
Mior, Michael J. ;
Lemire, Daniel .
SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, :221-230
[5]  
Clickhouse, 2021, US
[6]   The BigDAWG Polystore System [J].
Duggan, Jennie ;
Elmore, Aaron J. ;
Stonebraker, Michael ;
Balazinska, Magda ;
Howe, Bill ;
Kepner, Jeremy ;
Madden, Sam ;
Maier, David ;
Mattson, Tim ;
Zdonik, Stan .
SIGMOD RECORD, 2015, 44 (02) :11-16
[7]  
Eloquenttinyml, 2021, About us
[8]  
Flink, 2021, US
[9]  
Giannakouris V, 2016, 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), P452, DOI 10.1109/BigData.2016.7840636
[10]  
Goldstein J, 2001, SIGMOD REC, V30, P331, DOI 10.1145/376284.375706