Query grouping-based multi-query optimization framework for interactive SQL query engines on Hadoop

被引:3
|
作者
Chen, Ling [1 ,2 ]
Lin, Yan [1 ]
Wang, Jingchang [3 ]
Huang, Heqing [1 ]
Chen, Donghui [1 ]
Wu, Yong [3 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China
[2] Alibaba Zhejiang Univ, Joint Inst Frontier Technol, Hangzhou 310027, Zhejiang, Peoples R China
[3] Zhejiang Hongcheng Comp Syst Co Ltd, Hangzhou 310053, Zhejiang, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
grouping method; Impala system; multi-query optimization; GENETIC ALGORITHM;
D O I
10.1002/cpe.4676
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In the past few years, executing high-concurrency queries with interactive SQL query engines on Hadoop has become an important activity for many organizations. However, these systems do not adopt Multi-Query Optimization (MQO) to accelerate the process. There are two major concerns. Firstly, traditional MQO researches assume that multiple queries have high similarity. However, these systems usually serve a variety of applications. Although queries from the same application have high similarity, queries from different applications may have low similarity, so using traditional MQO will be inefficient and time consuming. Secondly, integrating MQO may lead to lots of system modifications. To integrate MQO into interactive SQL query engines on Hadoop efficiently, a query grouping-based MQO framework is proposed. A lightweight mechanism is used to represent SQL queries, on which a grouping method is exploited to speed up the optimization process. A cost model is integrated to estimate the execution cost of interactive SQL query engines on Hadoop. By using the proposed framework, we modify Impala system to support MQO, and the experimental results on TPC-DS show significant performance improvements.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Multi-Query Optimization in MapReduce Framework
    Wang, Guoping
    Chan, Chee-Yong
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 7 (03): : 145 - 156
  • [2] Multi-query SQL progress indicators
    Luo, Gang
    Naughton, Jeffrey F.
    Yu, Philip S.
    ADVANCES IN DATABASE TECHNOLOGY - EDBT 2006, 2006, 3896 : 921 - 941
  • [3] Multi-query Optimization for Distributed Similarity Query Processing
    Zhuang, Yi
    Li, Qing
    Chen, Lei
    28TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, VOLS 1 AND 2, PROCEEDINGS, 2008, : 639 - +
  • [4] Pipelining in multi-query optimization
    Dalvi, NN
    Sanghai, SK
    Roy, P
    Sudarshan, S
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) : 728 - 762
  • [5] SPARQL Multi-Query Optimization
    Chen, Jiaqi
    Zhang, Fan
    Zou, Lei
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (IEEE TRUSTCOM) / 12TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING (IEEE BIGDATASE), 2018, : 1419 - 1425
  • [6] Scalable Multi-Query Optimization for SPARQL
    Le, Wangchao
    Kementsietsidis, Anastasios
    Duan, Songyun
    Li, Feifei
    2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 666 - 677
  • [7] Multi-Query Optimization on RSS Feeds
    Getahun, Fekade
    Chbeir, Richard
    JOURNAL ON DATA SEMANTICS, 2018, 7 (01) : 47 - 64
  • [8] Multi-Query Optimization via Common Sub Query Elimination for SPARQL
    Zhou, Xiaoyi
    Luo, Jie
    He, Tao
    2017 10TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2017, : 213 - 218
  • [9] Multi-query optimization for sensor networks
    Trigoni, N
    Yao, Y
    Demers, A
    Gehrke, J
    Rajaraman, R
    DISTRIBUTED COMPUTING IN SENSOR SYSTEMS, PROCEEDINGS, 2005, 3560 : 307 - 321
  • [10] Efficient and Provable Multi-Query Optimization
    Kathuria, Tarun
    Sudarshan, S.
    PODS'17: PROCEEDINGS OF THE 36TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2017, : 53 - 67