The Snowflake Elastic Data Warehouse

被引:166
作者
Dageville, Benoit [1 ]
Cruanes, Thierry [1 ]
Zukowski, Marcin [1 ]
Antonov, Vadim [1 ]
Avanes, Artin [1 ]
Bock, Jon [1 ]
Claybaugh, Jonathan [1 ]
Engovatov, Daniel [1 ]
Hentschel, Martin [1 ]
Huang, Jiansheng [1 ]
Lee, Allison W. [1 ]
Motivala, Ashish [1 ]
Munir, Abdul Q. [1 ]
Pelley, Steven [1 ]
Povinec, Peter [1 ]
Rahn, Greg [1 ]
Triantafyllis, Spyridon [1 ]
Unterbrunner, Philipp [1 ]
机构
[1] Snowflake Comp, San Mateo, CA 94401 USA
来源
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2016年
关键词
Data warehousing; database as a service; multi-cluster shared data architecture;
D O I
10.1145/2882903.2903741
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We live in the golden age of distributed computing. Public cloud platforms now offer virtually unlimited compute and storage resources on demand. At the same time, the Software-as-a-Service (SaaS) model brings enterprise-class systems to users who previously could not afford such systems due to their cost and complexity. Alas, traditional data warehousing systems are struggling to fit into this new environment. For one thing, they have been designed for fixed resources and are thus unable to leverage the cloud's elasticity. For another thing, their dependence on complex ETL pipelines and physical tuning is at odds with the flexibility and freshness requirements of the cloud's new types of semi-structured data and rapidly evolving workloads. We decided a fundamental redesign was in order. Our mission was to build an enterprise-ready data warehousing solution for the cloud. The result is the Snowflake Elastic Data Warehouse, or "Snowflake" for short. Snowflake is a multi-tenant, transactional, secure, highly scalable and elastic system with full SQL support and built-in extensions for semi-structured and schema-less data. The system is offered as a pay-as-you-go service in the Amazon cloud. Users upload their data to the cloud and can immediately manage and query it using familiar tools and interfaces. Implementation began in late 2012 and Snowflake has been generally available since June 2015. Today, Snowflake is used in production by a growing number of small and large organizations alike. The system runs several million queries per day over multiple petabytes of data. In this paper, we describe the design of Snowflake and its novel multi-cluster, shared-data architecture. The paper highlights some of the key features of Snowflake: extreme elasticity and availability, semi-structured and schema-less data, time travel, and end-to-end security. It concludes with lessons learned and an outlook on ongoing work.
引用
收藏
页码:215 / 226
页数:12
相关论文
共 27 条
[1]  
Abadi D. J., 2008, P SIGMOD
[2]  
Ailamaki Anastassia., 2001, P VLDB
[3]   AsterixDB: A Scalable, Open Source BDMS [J].
Alsubaiee, Sattam ;
Altowim, Yasser ;
Altwaijry, Hotham ;
Behm, Alexander ;
Borkar, Vinayak ;
Bu, Yingyi ;
Carey, Michael ;
Cetindil, Inci ;
Cheelangi, Madhusudan ;
Faraaz, Khurram ;
Gabrielova, Eugenia ;
Grover, Raman ;
Heilbron, Zachary ;
Kim, Young-Seok ;
Chen Li ;
Li, Guangqiang ;
Ok, Ji Mahn ;
Onose, Nicola ;
Pirzadeh, Pouria ;
Tsotras, Vassilis ;
Vernica, Rares ;
Wen, Jian ;
Westmann, Till .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (14) :1905-1916
[4]  
[Anonymous], 2012, XRDS, DOI DOI 10.1145/2331042.2331057
[5]  
[Anonymous], 1995, DATA ENG B
[6]  
[Anonymous], ROLE BASED ACCESS CO
[7]  
[Anonymous], 2011, P SOSP
[8]  
Barker E., 2016, 80057 NIST SP
[9]  
Boncz P., 2005, P CIDR
[10]  
Cahill M. J., 2008, P SIGMOD