Making Sense of Big Data with the Berkeley Data Analytics Stack

被引:13
|
作者
Franklin, Michael [1 ]
机构
[1] Univ Calif Berkeley, Algorithms Machines & People Lab AMPLab, Berkeley, CA 94720 USA
关键词
Big Data;
D O I
10.1145/2684822.2685326
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Berkeley AMPLab is creating a new approach to data analytics. Launching in early 2011, the lab aims to seamlessly integrate the three main resources available for making sense of data at scale: Algorithms (machine learning and statistical techniques), Machines (in the form of scalable clusters and elastic cloud computing), and People (both individually as analysts and in crowds). The lab is realizing its ideas through the development of a freely-available Open Source software stack called BDAS: the Berkeley Data Analytics Stack. In the four years the lab has been in operation, we've released major components of BDAS. Several of these components have gained significant traction in industry and elsewhere: the Mesos cluster resource manager, the Spark in-memory computation framework, and the Shark query processing system. BDAS features prominently in many industry discussions of the future of the Big Data analytics ecosystem - a rare degree of impact for an ongoing academic project. Given this initial success, the lab is continuing on its research path, moving "up the stack" to better integrate and support advanced analytics and to make people a full-fledged resource for making sense of data. In this talk, I'll first outline the motivation and insights behind our research approach and describe how we have organized to address the cross-disciplinary nature of Big Data challenges. I will then describe the current state of BDAS with an emphasis on our newest efforts, including some or all of: the GraphX graph processing system, the Velox and MLBase machine learning platforms, and the SampleClean framework for hybrid human/computer data cleaning. Finally I will present our current views of how all the pieces will fit together to form a system that can adaptively bring the right resources to bear on a given data-driven question to meet time, cost and quality requirements throughout the analytics lifecycle.
引用
收藏
页码:1 / 1
页数:1
相关论文
共 50 条
  • [1] The Berkeley Data Analytics Stack: Present and Future
    Franklin, Mike
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [2] Elastic Stack in Action for Smart Cities: Making Sense of Big Data
    Talas, Andrei
    Pop, Florin
    Neagu, Gabriel
    2017 13TH IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2017, : 469 - 476
  • [3] Making the Most of Big Data and Data Analytics
    Turner, Shawn M.
    ITE JOURNAL-INSTITUTE OF TRANSPORTATION ENGINEERS, 2021, 91 (02): : 24 - 26
  • [4] MAKING SENSE OF BIG DATA
    Perkel, Jeffrey
    BIOTECHNIQUES, 2016, 60 (03) : 108 - 111
  • [5] MAKING SENSE OF BIG DATA
    Dumbill, Edd
    BIG DATA, 2013, 1 (01) : 1 - +
  • [6] Making sense of big data
    Wolfe, Patrick J.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (45) : 18031 - 18032
  • [7] Making Big Sense From Big Data
    Hartung, Thomas
    FRONTIERS IN BIG DATA, 2018, 1
  • [8] The Emerging Hadoop, Analytics, Stream Stack for Big Data
    Bernstein, David
    IEEE CLOUD COMPUTING, 2014, 1 (04): : 84 - 86
  • [9] MAKING SENSE OF BIG DATA FOR SECURITY
    Chan, Janet
    Moses, Lyria Bennett
    BRITISH JOURNAL OF CRIMINOLOGY, 2017, 57 (02): : 299 - 319
  • [10] Data Value, Big Data Analytics, and Decision-Making
    Monino, Jean-Louis
    JOURNAL OF THE KNOWLEDGE ECONOMY, 2021, 12 (01) : 256 - 267