Query grouping-based multi-query optimization framework for interactive SQL query engines on Hadoop

被引:3
|
作者
Chen, Ling [1 ,2 ]
Lin, Yan [1 ]
Wang, Jingchang [3 ]
Huang, Heqing [1 ]
Chen, Donghui [1 ]
Wu, Yong [3 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China
[2] Alibaba Zhejiang Univ, Joint Inst Frontier Technol, Hangzhou 310027, Zhejiang, Peoples R China
[3] Zhejiang Hongcheng Comp Syst Co Ltd, Hangzhou 310053, Zhejiang, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
grouping method; Impala system; multi-query optimization; GENETIC ALGORITHM;
D O I
10.1002/cpe.4676
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In the past few years, executing high-concurrency queries with interactive SQL query engines on Hadoop has become an important activity for many organizations. However, these systems do not adopt Multi-Query Optimization (MQO) to accelerate the process. There are two major concerns. Firstly, traditional MQO researches assume that multiple queries have high similarity. However, these systems usually serve a variety of applications. Although queries from the same application have high similarity, queries from different applications may have low similarity, so using traditional MQO will be inefficient and time consuming. Secondly, integrating MQO may lead to lots of system modifications. To integrate MQO into interactive SQL query engines on Hadoop efficiently, a query grouping-based MQO framework is proposed. A lightweight mechanism is used to represent SQL queries, on which a grouping method is exploited to speed up the optimization process. A cost model is integrated to estimate the execution cost of interactive SQL query engines on Hadoop. By using the proposed framework, we modify Impala system to support MQO, and the experimental results on TPC-DS show significant performance improvements.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Multi-query optimization for on-line analytical processing
    Kalnis, P
    Papadias, D
    INFORMATION SYSTEMS, 2003, 28 (05) : 457 - 473
  • [22] Query scheduling in multi query optimization
    Gupta, A
    Sudarshan, S
    Vishwanathan, S
    2001 INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2001, : 11 - 19
  • [23] Demand-based Sensor Data Gathering with Multi-Query Optimization
    Hulsmann, Julius
    Traub, Jonas
    Markl, Volker
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12): : 2801 - 2804
  • [24] On Multi-Query Local Community Detection
    Bian, Yuchen
    Yan, Yaowei
    Cheng, Wei
    Wang, Wei
    Luo, Dongsheng
    Zhang, Xiang
    2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 9 - 18
  • [25] Multi-Query Stream Processing on FPGAs
    Sadoghi, Mohammad
    Javed, Rija
    Tarafdar, Naif
    Singh, Harsh
    Palaniappan, Rohan
    Jacobsen, Hans-Arno
    2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 1229 - 1232
  • [26] Deep multi-query video retrieval
    Akbacak E.
    Vural C.
    Journal of Visual Communication and Image Representation, 2022, 85
  • [27] Multi-Query Person Search with Transformers
    Chen, Ying
    Li, Zhihui
    Song, Andy
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT IV, PAKDD 2024, 2024, 14648 : 116 - 128
  • [28] Q-Graph: Preserving Query Locality in Multi-Query Graph Processing
    Mayer, Christian
    Mayer, Ruben
    Grunert, Jonas
    Rothermel, Kurt
    Tariq, Muhammad Adnan
    GRADES-NDA '18: PROCEEDINGS OF THE 1ST ACM SIGMOD JOINT INTERNATIONAL WORKSHOP ON GRAPH DATA MANAGEMENT EXPERIENCES & SYSTEMS (GRADES) AND NETWORK DATA ANALYTICS (NDA) 2018 (GRADES-NDA 2018), 2018,
  • [29] Multi-Query Optimization in Federated Databases using Evolutionary Algorithm
    Mansha, Sameen
    Kamiran, Faisal
    2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 723 - 726
  • [30] Multi-Query Optimization in Wide-Area Streaming Analytics
    Jonathan, Albert
    Chandra, Abhishek
    Weissman, Jon
    PROCEEDINGS OF THE 2018 ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '18), 2018, : 412 - 425