RHEEM: Enabling Cross-Platform Data Processing

被引:28
|
作者
Agrawal, Divy [2 ]
Chawla, Sanjay [1 ]
Contreras-Rojas, Bertty [1 ]
Elmagarmid, Ahmed [1 ]
Idris, Yasser [1 ]
Kaoudi, Zoi [1 ]
Kruse, Sebastian [3 ]
Lucas, Ji [1 ]
Mansour, Essam [1 ]
Ouzzani, Mourad [1 ]
Papotti, Paolo [1 ,4 ]
Quiane-Ruiz, Jorge-Arnulfo [1 ]
Tang, Nan [1 ]
Thirumuruganathan, Saravanan [1 ]
Troudi, Anis [1 ]
机构
[1] HBKU, Qatar Comp Res Inst, Doha, Qatar
[2] UCSB, Santa Barbara, CA 93106 USA
[3] Hasso Plattner Inst, Potsdam, Germany
[4] Eurecom, Biot, France
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 11卷 / 11期
关键词
EFFICIENT;
D O I
10.14778/3236187.3236195
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Solving business problems increasingly requires going beyond the limits of a single data processing platform (platform for short), such as Hadoop or a DBMS. As a result, organizations typically perform tedious and costly tasks to juggle their code and data across different platforms. Addressing this pain and achieving automatic cross-platform data processing is quite challenging: finding the most efficient platform for a given task requires quite good expertise for all the available platforms. We present RHEEM, a general-purpose cross-platform data processing system that decouples applications from the underlying platforms. It not only determines the best platform to run an incoming task, but also splits the task into subtasks and assigns each subtask to a specific platform to minimize the overall cost (e.g., runtime or monetary cost). It features (i) an interface to easily compose data analytic tasks; (ii) a novel cost-based optimizer able to find the most efficient platform in almost all cases; and (iii) an executor to efficiently orchestrate tasks over different platforms. As a result, it allows users to focus on the business logic of their applications rather than on the mechanics of how to compose and execute them. Using different real-world applications with RHEEM, we demonstrate how cross-platform data processing can accelerate performance by more than one order of magnitude compared to single-platform data processing.
引用
收藏
页码:1414 / 1427
页数:14
相关论文
共 50 条
  • [31] Adaptive Processing for Distributed Skyline Queries over Uncertain Data
    Zhou, Xu
    Li, Kenli
    Zhou, Yantao
    Li, Keqin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (02) : 371 - 384
  • [32] Probabilistic nearest neighbor query processing on distributed uncertain data
    Amagata, Daichi
    Sasaki, Yuya
    Hara, Takahiro
    Nishio, Shojiro
    DISTRIBUTED AND PARALLEL DATABASES, 2016, 34 (02) : 259 - 287
  • [33] Acorn: Aggressive Result Caching in Distributed Data Processing Frameworks
    Ramjit, Lana
    Interlandi, Matteo
    Wu, Eugene
    Netravali, Ravi
    PROCEEDINGS OF THE 2019 TENTH ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '19), 2019, : 206 - 219
  • [34] Privacy-Preserving Data Processing with Flexible Access Control
    Ding, Wenxiu
    Yan, Zheng
    Deng, Robert H.
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2020, 17 (02) : 363 - 376
  • [35] Emerging Data Processing Methods for Single-Entity Electrochemistry
    Li, Xinyi
    Fu, Ying-Huan
    Wei, Nannan
    Yu, Ru-Jia
    Bhatti, Huma
    Zhang, Limin
    Yan, Feng
    Xia, Fan
    Ewing, Andrew G.
    Long, Yi-Tao
    Ying, Yi-Lun
    ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2024, 63 (17)
  • [36] A Simple and General Platform for Generating Stereochemically Complex Polyene Frameworks by Iterative Cross-Coupling
    Lee, Suk Joong
    Anderson, Thomas M.
    Burke, Martin D.
    ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2010, 49 (47) : 8860 - 8863
  • [37] Cost-Aware Scheduling and Data Skew Alleviation for Big Data Processing in Heterogeneous Cloud Environment
    Li, Hongjian
    Zhu, Lisha
    Wang, Shuaicheng
    Wang, Lei
    JOURNAL OF GRID COMPUTING, 2023, 21 (03)
  • [38] Efficient interaction algorithm of multi-thread data under resource sharing platform
    Wang T.
    Wang, Tianlin (wh_wtl@163.com), 1600, Taru Publications (20): : 1471 - 1475
  • [39] Lightweight and Decentralized Cross-Cloud Auditing With Data Recovery
    Qiao, Liping
    Li, Yanping
    Ding, Yong
    Yang, Bo
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (01) : 60 - 73
  • [40] Data Gathering Techniques in WSN: A Cross-Layer View
    Gurewitz, Omer
    Shifrin, Mark
    Dvir, Efi
    SENSORS, 2022, 22 (07)