RHEEM: Enabling Cross-Platform Data Processing

被引:28
|
作者
Agrawal, Divy [2 ]
Chawla, Sanjay [1 ]
Contreras-Rojas, Bertty [1 ]
Elmagarmid, Ahmed [1 ]
Idris, Yasser [1 ]
Kaoudi, Zoi [1 ]
Kruse, Sebastian [3 ]
Lucas, Ji [1 ]
Mansour, Essam [1 ]
Ouzzani, Mourad [1 ]
Papotti, Paolo [1 ,4 ]
Quiane-Ruiz, Jorge-Arnulfo [1 ]
Tang, Nan [1 ]
Thirumuruganathan, Saravanan [1 ]
Troudi, Anis [1 ]
机构
[1] HBKU, Qatar Comp Res Inst, Doha, Qatar
[2] UCSB, Santa Barbara, CA 93106 USA
[3] Hasso Plattner Inst, Potsdam, Germany
[4] Eurecom, Biot, France
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 11卷 / 11期
关键词
EFFICIENT;
D O I
10.14778/3236187.3236195
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Solving business problems increasingly requires going beyond the limits of a single data processing platform (platform for short), such as Hadoop or a DBMS. As a result, organizations typically perform tedious and costly tasks to juggle their code and data across different platforms. Addressing this pain and achieving automatic cross-platform data processing is quite challenging: finding the most efficient platform for a given task requires quite good expertise for all the available platforms. We present RHEEM, a general-purpose cross-platform data processing system that decouples applications from the underlying platforms. It not only determines the best platform to run an incoming task, but also splits the task into subtasks and assigns each subtask to a specific platform to minimize the overall cost (e.g., runtime or monetary cost). It features (i) an interface to easily compose data analytic tasks; (ii) a novel cost-based optimizer able to find the most efficient platform in almost all cases; and (iii) an executor to efficiently orchestrate tasks over different platforms. As a result, it allows users to focus on the business logic of their applications rather than on the mechanics of how to compose and execute them. Using different real-world applications with RHEEM, we demonstrate how cross-platform data processing can accelerate performance by more than one order of magnitude compared to single-platform data processing.
引用
收藏
页码:1414 / 1427
页数:14
相关论文
共 50 条
  • [41] OpenMS: a flexible open-source software platform for mass spectrometry data analysis
    Roest, Hannes L.
    Sachsenberg, Timo
    Aiche, Stephan
    Bielow, Chris
    Weisser, Hendrik
    Aicheler, Fabian
    Andreotti, Sandro
    Ehrlich, Hans-Christian
    Gutenbrunner, Petra
    Kenar, Erhan
    Liang, Xiao
    Nahnsen, Sven
    Nilse, Lars
    Pfeuffer, Julianus
    Rosenberger, George
    Rurik, Marc
    Schmitt, Uwe
    Veit, Johannes
    Walzer, Mathias
    Wojnar, David
    Wolski, Witold E.
    Schilling, Oliver
    Choudhary, Jyoti S.
    Malmstrom, Lars
    Aebersold, Ruedi
    Reinert, Knut
    Kohlbacher, Oliver
    NATURE METHODS, 2016, 13 (09) : 741 - 748
  • [42] Data-Centric Computing Frontiers: A Survey On Processing-In-Memory
    Siegl, Patrick
    Buchty, Rainer
    Berekovic, Mladen
    MEMSYS 2016: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2016, : 295 - 308
  • [43] PipeArch: Generic and Context-Switch Capable Data Processing on FPGAs
    Kara, Kaan
    Alonso, Gustavo
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2021, 14 (01)
  • [44] A new approach for data processing in supply chain network based on FPGA
    Zou, Xiaofu
    Tao, Fei
    Jiang, Penglong
    Gu, Shixin
    Qiao, Kan
    Zuo, Ying
    Xu, Lida
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2016, 84 (1-4): : 249 - 260
  • [45] Swift: Reliable and Low-Latency Data Processing at Cloud Scale
    Wang, Bo
    Hou, Zhenyu
    Tao, Yangyu
    Lu, Yifeng
    Li, Chao
    Guan, Tao
    Jiang, Xiaowei
    Jiang, Jinlei
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2387 - 2398
  • [46] ASTROIDE: A Unified Astronomical Big Data Processing Engine over Spark
    Brahem, Mariem
    Zeitouni, Karine
    Yeh, Laurent
    IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (03) : 477 - 491
  • [47] Improving the Performance of Processing Recursive Structures of XML Path Queries and Data
    Alghamdi, Norah Saleh
    2016 INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT), 2016, : 176 - 181
  • [48] DV-DVFS: merging data variety and DVFS technique to manage the energy consumption of big data processing
    Ahmadvand, Hossein
    Foroutan, Fouzhan
    Fathy, Mahmood
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [49] Predesigned Covalent Organic Frameworks as Effective Platforms for Pd(II) Coordination Enabling Cross-Coupling Reactions under Sustainable Conditions
    Lopez-Magano, Alberto
    Mas-Balleste, Ruben
    Aleman, Jose
    ADVANCED SUSTAINABLE SYSTEMS, 2022, 6 (03)
  • [50] Student Research Abstract: Spatial Data Processing Meets RDF Graph Exploration
    Yousfi, Houssameddine
    37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2022, : 389 - 392