Parallel query processing in a polystore

被引：0

作者：

Pavlos Kranas

Boyan Kolev

Oleksandra Levchenko

Esther Pacitti

Patrick Valduriez

Ricardo Jiménez-Peris

Marta Patiño-Martinez

机构：

[1] LeanXcale,

[2] Distributed Systems Lab at Universidad Politécnica de Madrid,undefined

[3] Inria,undefined

[4] University of Montpellier,undefined

[5] CNRS,undefined

[6] LIRMM,undefined

来源：

Distributed and Parallel Databases | 2021年 / 39卷

关键词：

Database integration; Heterogeneous databases; Distributed and parallel databases; Polystores; Query languages; Query processing;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The blooming of different data stores has made polystores a major topic in the cloud and big data landscape. As the amount of data grows rapidly, it becomes critical to exploit the inherent parallel processing capabilities of underlying data stores and data processing platforms. To fully achieve this, a polystore should: (i) preserve the expressivity of each data store’s native query or scripting language and (ii) leverage a distributed architecture to enable parallel data integration, i.e. joins, on top of parallel retrieval of underlying partitioned datasets. In this paper, we address these points by: (i) using the polyglot approach of the CloudMdsQL query language that allows native queries to be expressed as inline scripts and combined with SQL statements for ad-hoc integration and (ii) incorporating the approach within the LeanXcale distributed query engine, thus allowing for native scripts to be processed in parallel at data store shards. In addition, (iii) efficient optimization techniques, such as bind join, can take place to improve the performance of selective joins. We evaluate the performance benefits of exploiting parallelism in combination with high expressivity and optimization through our experimental validation.

引用

页码：939 / 977

页数：38

共 68 条

[1]

Duggan J(2015)The BigDAWG polystore system SIGMOD Record 44 11-16

[2]

Elmore AJ(2009)HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads PVLDB 2 922-933

[3]

Stonebraker M(2013)Odyssey: a multi-store system for evolutionary analytics PVLDB 6 1180-1181

[4]

Balazinska M(1998)“Scaling access to heterogeneous data sources with DISCO” IEEE Trans. Knowl. Data Eng. 10 808-823

[5]

Howe B(2016)Query processing in multistore systems: an overview Int. J. Cloud Comput. 5 309-346

[6]

Kepner J(2009)Hive: a warehousing solution over a map-reduce framework PVLDB 2 1626-1629

[7]

Madden S(2008)SCOPE: easy and efficient parallel processing of massive data sets PVLDB 1 1265-1276

[8]

Maier D(2012)SCOPE: parallel databases meet MapReduce PVLDB 21 611-636

[9]

Mattson T(2018)RHEEM: enabling cross-platform data processing: may the big data be with you! Proc. VLDB Endow. 11 1414-1427

[10]

Zdonik S(2020)RHEEMix in the data jungle: a cost-based optimizer for cross-platform systems VLDB J. undefined undefined-undefined

← 1 2 3 4 5 6 7 →