Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources

被引:78
作者
Begoli, Edmon [1 ]
Camacho-Rodriguez, Jesus [2 ]
Hyde, Julian [2 ]
Mior, Michael J. [3 ]
Lemire, Daniel [4 ]
机构
[1] Oak Ridge Natl Lab, Oak Ridge, TN 37830 USA
[2] Hortonworks Inc, Santa Clara, CA USA
[3] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON, Canada
[4] Univ Quebec TELUQ, Montreal, PQ, Canada
来源
SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2018年
关键词
Apache Calcite; Relational Semantics; Data Management; Query Algebra; Modular Query Optimization; Storage Adapters;
D O I
10.1145/3183713.3190662
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. The goal of this paper is to formally introduce Calcite to the broader research community, briefly present its history, and describe its architecture, features, functionality, and patterns for adoption. Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.
引用
收藏
页码:221 / 230
页数:10
相关论文
共 33 条
[1]  
[Anonymous], 1995, IEEE DATA ENG B
[2]  
[Anonymous], CIDR
[3]  
[Anonymous], OPENGIS IMPL SPEC GE
[4]  
[Anonymous], JANINO SUPER SMALL S
[5]  
[Anonymous], FAST FLEXIBLE QUERY
[6]  
[Anonymous], 2 WORKSH METH MAN HE
[7]  
[Anonymous], ARXIV14053631
[8]  
[Anonymous], 2003, 200367 STANF INFOLAB
[9]   Spark SQL: Relational Data Processing in Spark [J].
Armbrust, Michael ;
Xin, Reynold S. ;
Lian, Cheng ;
Huai, Yin ;
Liu, Davies ;
Bradley, Joseph K. ;
Meng, Xiangrui ;
Kaftan, Tomer ;
Franklint, Michael J. ;
Ghodsi, Ali ;
Zaharia, Matei .
SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, :1383-1394
[10]   Algebricks: A Data Model-Agnostic Compiler Backend for Big Data Languages [J].
Borkar, Vinayak ;
Bu, Yingyi ;
Carman, E. Preston, Jr. ;
Onose, Nicola ;
Westmann, Till ;
Pirzadeh, Pouria ;
Carey, Michael J. ;
Tsotras, Vassilis J. .
ACM SOCC'15: PROCEEDINGS OF THE SIXTH ACM SYMPOSIUM ON CLOUD COMPUTING, 2015, :422-433