Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics

被引：65

作者：

Ramakrishnan, Raghu ^{[1
]}

Sridharan, Baskar ^{[1
]}

Douceur, John R. ^{[1
]}

Kasturi, Pavan ^{[1
]}

Krishnamachari-Sampath, Balaji ^{[1
]}

Krishnamoorthy, Karthick ^{[1
]}

Li, Peng ^{[1
]}

Manu, Mitica ^{[1
]}

Michaylov, Spiro ^{[1
]}

Ramos, Rogerio ^{[1
]}

Sharman, Neil ^{[1
]}

Xu, Zee ^{[1
]}

Barakat, Youssef ^{[1
]}

Douglas, Chris ^{[1
]}

Draves, Richard ^{[1
]}

Naidu, Shrikant S. ^{[2
]}

Shastry, Shankar ^{[2
]}

Sikaria, Atul ^{[1
]}

Sun, Simon ^{[1
]}

Venkatesan, Ramarathnam ^{[1
]}

机构：

[1] Microsoft, One Microsoft Way, Redmond, WA 98052 USA

[2] 9 Vigyan,Lavelle Rd,Floors Gr,2&3, Bangalore 560001, Karnataka, India

来源：

SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2017年

关键词：

Storage; HDFS; Hadoop; map-reduce; distributed file system; tiered storage; cloud service; Azure; AWS; GCE; Big Data;

D O I：

10.1145/3035918.3056100

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Azure Data Lake Store (ADLS) is a fully-managed, elastic, scalable, and secure file system that supports Hadoop distributed file system (HDFS) and Cosmos semantics. It is specifically designed and optimized for a broad spectrum of Big Data analytics that depend on a very high degree of parallel reads and writes, as well as collocation of compute and data for high bandwidth and low-latency access. It brings together key components and features of Microsoft's Cosmos file system-long used internally at Microsoft as the warehouse for data and analytics-and HDFS, and is a unified file storage solution for analytics on Azure. Internal and external workloads run on this unified platform. Distinguishing aspects of ADLS include its support for multiple storage tiers, exabyte scale, and comprehensive security and data sharing. We discuss ADLS architecture, design points, and performance.

引用

页码：51 / 63

页数：13

共 27 条

[1] Aizikowitz J. I., TR891040 CORN U
[2] Alvaro P., 2012, EUROSYS
[3] [Anonymous], 1997, AUDIT CONTROL INTERF
[4] [Anonymous], 2003, P 19 ACM S OP SYST P, DOI [10.1145/1165389.945450, DOI 10.1145/1165389.945450]
[5] Baker J., 2011, CIDR
[6] Calder B, 2011, SOSP 11: PROCEEDINGS OF THE TWENTY-THIRD ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, P143
[7] Campbell D., 2012, TIERED STORAGE
[8] SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets
Chaiken, Ronnie
Jenkins, Bob
Larson, Per-Ake
Ramsey, Bill
Shakib, Darren
Weaver, Simon
Zhou, Jingren
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02): : 1265 - 1276
[9] Cooper B. F., 2010, ACM SOCC
[10] Dean Jeffrey, 2004, ACM SOCC

← 1 2 3 →