Building and Operating a Large-Scale Enterprise Data Analytics Platform

被引:6
作者
Bauer, Daniel [1 ]
Froese, Florian [1 ]
Garces-Erice, Luis [1 ]
Giblin, Chris [1 ]
Labbi, Abdel [1 ]
Nagy, Zoltan A. [1 ]
Pardon, Niels [1 ]
Rooney, Sean [1 ]
Urbanetz, Peter [1 ]
Vetsch, Pascal [1 ]
Wespi, Andreas [1 ]
机构
[1] IBM Res Europe, Saumerstr 4, CH-8803 Ruschlikon, Switzerland
关键词
Hybrid cloud; Datalake; Storage; Ingestion; SQL/Hadoop; Governance;
D O I
10.1016/j.bdr.2020.100181
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the last three years we have been running a large-scale data processing platform for applying analytics to corporate data at scale on an OpenStack private cloud instance. Our platform makes a wide variety of corporate data assets, such as sales, marketing, customer information, as well as data from less conventional sources such as weather, news and social media available for analytics purposes to hundreds of globally distributed teams across the company. We control every layer in the stack from the processing engines down to the hardware. Here we report our experiences in building and operating such a system. We describe our technical choices and describe how they evolved as we observed the actual workloads created by users. (C) 2020 The Authors. Published by Elsevier Inc.
引用
收藏
页数:20
相关论文
共 29 条
[1]  
[Anonymous], 3592004 ANSIINCITS
[2]  
[Anonymous], 2010, P 2 USENIX WORKSH HO
[3]  
Bauer D, 2018, IEEE INT CONF BIG DA, P3359, DOI 10.1109/BigData.2018.8622597
[4]   Borg, Omega, and Kubernetes [J].
Burns, Brendan ;
Grant, Brian ;
Oppenheimer, David ;
Brewer, Eric ;
Wilkes, John .
COMMUNICATIONS OF THE ACM, 2016, 59 (05) :50-57
[5]  
Calder B, 2011, SOSP 11: PROCEEDINGS OF THE TWENTY-THIRD ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, P143
[6]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[7]  
Harris CL, 2017, CLIN MICROBIOL DIAGN, V2, P7, DOI 10.1016/B978-0-12-811079-9.00002-1
[8]   Presto: Edge-based Load Balancing for Fast Datacenter Networks [J].
He, Keqiang ;
Rozner, Eric ;
Agarwal, Kanak ;
Felter, Wes ;
Carter, John ;
Akella, Aditya .
Computer Communication Review, 2015, 45 (04) :465-478
[9]  
I.T. Association I.T. Association, INF ARCH SPEC S
[10]  
Kornacker M., 2015, 7 BIENN C INN DAT SY, V1, P9