Query grouping-based multi-query optimization framework for interactive SQL query engines on Hadoop

被引:3
|
作者
Chen, Ling [1 ,2 ]
Lin, Yan [1 ]
Wang, Jingchang [3 ]
Huang, Heqing [1 ]
Chen, Donghui [1 ]
Wu, Yong [3 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China
[2] Alibaba Zhejiang Univ, Joint Inst Frontier Technol, Hangzhou 310027, Zhejiang, Peoples R China
[3] Zhejiang Hongcheng Comp Syst Co Ltd, Hangzhou 310053, Zhejiang, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
grouping method; Impala system; multi-query optimization; GENETIC ALGORITHM;
D O I
10.1002/cpe.4676
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In the past few years, executing high-concurrency queries with interactive SQL query engines on Hadoop has become an important activity for many organizations. However, these systems do not adopt Multi-Query Optimization (MQO) to accelerate the process. There are two major concerns. Firstly, traditional MQO researches assume that multiple queries have high similarity. However, these systems usually serve a variety of applications. Although queries from the same application have high similarity, queries from different applications may have low similarity, so using traditional MQO will be inefficient and time consuming. Secondly, integrating MQO may lead to lots of system modifications. To integrate MQO into interactive SQL query engines on Hadoop efficiently, a query grouping-based MQO framework is proposed. A lightweight mechanism is used to represent SQL queries, on which a grouping method is exploited to speed up the optimization process. A cost model is integrated to estimate the execution cost of interactive SQL query engines on Hadoop. By using the proposed framework, we modify Impala system to support MQO, and the experimental results on TPC-DS show significant performance improvements.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Materialized view selection and maintenance using multi-query optimization
    Mistry, H
    Roy, P
    Sudarshan, S
    Ramamritham, K
    SIGMOD RECORD, 2001, 30 (02) : 307 - 318
  • [32] Why Bee colony is the most suitable with multi-query optimization?
    AbdelGaber, Sayed
    Abdel-Fattah, Manal A.
    Nasr, S. A.
    5TH INTERNATIONAL CONFERENCE ON COMPUTING AND INFORMATICS (ICCI 2022), 2022, : 74 - 79
  • [33] Multi-Query Optimization for Complex Event Processing in SAP ESP
    Zhang, Shuhao
    Hoang Tam Vo
    Dahlmeier, Daniel
    He, Bingsheng
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 1213 - 1224
  • [34] Pipeline-based multi-query optimization for similarity queries in grid environment
    Hu H.
    Zhuang Y.
    Hu H.-Y.
    Chiu D.
    Ruan Jian Xue Bao/Journal of Software, 2010, 21 (01): : 55 - 67
  • [35] Sempala: Interactive SPARQL Query Processing on Hadoop
    Schaetzle, Alexander
    Przyjaciel-Zablocki, Martin
    Neu, Antony
    Lausen, Georg
    SEMANTIC WEB - ISWC 2014, PT I, 2014, 8796 : 164 - 179
  • [36] Query Execution Optimization in Spark SQL
    Ji, Xuechun
    Zhao, Maoxian
    Zhai, Mingyu
    Wu, Qingxi
    SCIENTIFIC PROGRAMMING, 2020, 2020 (2020)
  • [37] QUERY OPTIMIZATION IN MICROSOFT SQL SERVER
    Haxhijaha, Blerta
    Ajdari, Jaumin
    Raufi, Bujar
    Zenuni, Xhemal
    Ismaili, Florie
    INTERNATIONAL JOURNAL ON INFORMATION TECHNOLOGIES AND SECURITY, 2018, 10 (02): : 13 - 22
  • [38] A Novel Approach for SQL Query Optimization
    Mithani, Fazal
    Machchhar, Sahista
    Jasdanwala, Fernaz
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH, 2016, : 898 - 901
  • [39] MUSE: Multi-query Event Trend Aggregation
    Rozet, Allison
    Poppe, Olga
    Lei, Chuan
    Rundensteiner, Elke A.
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 2193 - 2196
  • [40] Review of Research on Multi-query Sharing Technology
    Wei J.-H.
    Xia Y.-F.
    Gong X.-Q.
    Gong, Xue-Qing (xqgong@sei.ecnu.edu.cn), 1600, Chinese Academy of Sciences (32): : 3176 - 3202