Making Sense of Big Data with the Berkeley Data Analytics Stack

被引：13

作者：

Franklin, Michael ^{[1
]}

机构：

[1] Univ Calif Berkeley, Algorithms Machines & People Lab AMPLab, Berkeley, CA 94720 USA

来源：

WSDM'15: PROCEEDINGS OF THE EIGHTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING | 2015年

关键词：

Big Data;

D O I：

10.1145/2684822.2685326

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Berkeley AMPLab is creating a new approach to data analytics. Launching in early 2011, the lab aims to seamlessly integrate the three main resources available for making sense of data at scale: Algorithms (machine learning and statistical techniques), Machines (in the form of scalable clusters and elastic cloud computing), and People (both individually as analysts and in crowds). The lab is realizing its ideas through the development of a freely-available Open Source software stack called BDAS: the Berkeley Data Analytics Stack. In the four years the lab has been in operation, we've released major components of BDAS. Several of these components have gained significant traction in industry and elsewhere: the Mesos cluster resource manager, the Spark in-memory computation framework, and the Shark query processing system. BDAS features prominently in many industry discussions of the future of the Big Data analytics ecosystem - a rare degree of impact for an ongoing academic project. Given this initial success, the lab is continuing on its research path, moving "up the stack" to better integrate and support advanced analytics and to make people a full-fledged resource for making sense of data. In this talk, I'll first outline the motivation and insights behind our research approach and describe how we have organized to address the cross-disciplinary nature of Big Data challenges. I will then describe the current state of BDAS with an emphasis on our newest efforts, including some or all of: the GraphX graph processing system, the Velox and MLBase machine learning platforms, and the SampleClean framework for hybrid human/computer data cleaning. Finally I will present our current views of how all the pieces will fit together to form a system that can adaptively bring the right resources to bear on a given data-driven question to meet time, cost and quality requirements throughout the analytics lifecycle.

引用

页码：1 / 1

页数：1

共 50 条

[1] The Berkeley Data Analytics Stack: Present and Future
Franklin, Mike
2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
[2] Elastic Stack in Action for Smart Cities: Making Sense of Big Data
Talas, Andrei
Pop, Florin
Neagu, Gabriel
2017 13TH IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2017, : 469 - 476
[3] Making the Most of Big Data and Data Analytics
Turner, Shawn M.
ITE JOURNAL-INSTITUTE OF TRANSPORTATION ENGINEERS, 2021, 91 (02): : 24 - 26
[4] MAKING SENSE OF BIG DATA
Perkel, Jeffrey
BIOTECHNIQUES, 2016, 60 (03) : 108 - 111
[5] MAKING SENSE OF BIG DATA
Dumbill, Edd
BIG DATA, 2013, 1 (01) : 1 - +
[6] Making sense of big data
Wolfe, Patrick J.
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (45) : 18031 - 18032
[7] Making Big Sense From Big Data
Hartung, Thomas
FRONTIERS IN BIG DATA, 2018, 1
[8] The Emerging Hadoop, Analytics, Stream Stack for Big Data
Bernstein, David
IEEE CLOUD COMPUTING, 2014, 1 (04): : 84 - 86
[9] MAKING SENSE OF BIG DATA FOR SECURITY
Chan, Janet
Moses, Lyria Bennett
BRITISH JOURNAL OF CRIMINOLOGY, 2017, 57 (02): : 299 - 319
[10] Data Value, Big Data Analytics, and Decision-Making
Monino, Jean-Louis
JOURNAL OF THE KNOWLEDGE ECONOMY, 2021, 12 (01) : 256 - 267

← 1 2 3 4 5 →