RHEEM: Enabling Cross-Platform Data Processing

被引:28
|
作者
Agrawal, Divy [2 ]
Chawla, Sanjay [1 ]
Contreras-Rojas, Bertty [1 ]
Elmagarmid, Ahmed [1 ]
Idris, Yasser [1 ]
Kaoudi, Zoi [1 ]
Kruse, Sebastian [3 ]
Lucas, Ji [1 ]
Mansour, Essam [1 ]
Ouzzani, Mourad [1 ]
Papotti, Paolo [1 ,4 ]
Quiane-Ruiz, Jorge-Arnulfo [1 ]
Tang, Nan [1 ]
Thirumuruganathan, Saravanan [1 ]
Troudi, Anis [1 ]
机构
[1] HBKU, Qatar Comp Res Inst, Doha, Qatar
[2] UCSB, Santa Barbara, CA 93106 USA
[3] Hasso Plattner Inst, Potsdam, Germany
[4] Eurecom, Biot, France
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 11卷 / 11期
关键词
EFFICIENT;
D O I
10.14778/3236187.3236195
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Solving business problems increasingly requires going beyond the limits of a single data processing platform (platform for short), such as Hadoop or a DBMS. As a result, organizations typically perform tedious and costly tasks to juggle their code and data across different platforms. Addressing this pain and achieving automatic cross-platform data processing is quite challenging: finding the most efficient platform for a given task requires quite good expertise for all the available platforms. We present RHEEM, a general-purpose cross-platform data processing system that decouples applications from the underlying platforms. It not only determines the best platform to run an incoming task, but also splits the task into subtasks and assigns each subtask to a specific platform to minimize the overall cost (e.g., runtime or monetary cost). It features (i) an interface to easily compose data analytic tasks; (ii) a novel cost-based optimizer able to find the most efficient platform in almost all cases; and (iii) an executor to efficiently orchestrate tasks over different platforms. As a result, it allows users to focus on the business logic of their applications rather than on the mechanics of how to compose and execute them. Using different real-world applications with RHEEM, we demonstrate how cross-platform data processing can accelerate performance by more than one order of magnitude compared to single-platform data processing.
引用
收藏
页码:1414 / 1427
页数:14
相关论文
共 50 条
  • [21] Enabling Verifiable Privacy-Preserving Multi-Type Data Aggregation in Smart Grids
    Zhang, Xiaojun
    Huang, Chao
    Zhang, Yuan
    Cao, Sheng
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2022, 19 (06) : 4225 - 4239
  • [22] An Effective Encrypted Scheme Over Outsourcing Data for Query on Cloud Platform
    Tang, Jianchao
    Fu, Shaojing
    Xu, Ming
    IEEE ACCESS, 2019, 7 : 66242 - 66250
  • [23] Sunder: Enabling Low-Overhead and Scalable Near-Data Pattern Matching Acceleration
    Sadredini, Elaheh
    Rahimi, Reza
    Imani, Mohsen
    Skadron, Kevin
    PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 311 - 323
  • [24] A Reference Architecture for Cloud-Edge Meta-Operating Systems Enabling Cross-Domain, Data-Intensive, ML-Assisted Applications: Architectural Overview and Key Concepts
    Trakadas, Panagiotis
    Masip-Bruin, Xavi
    Facca, Federico M.
    Spantideas, Sotirios T.
    Giannopoulos, Anastasios E.
    Kapsalis, Nikolaos C.
    Martins, Rui
    Bosani, Enrica
    Ramon, Joan
    Gonzalez Prats, Raul
    Ntroulias, George
    Lyridis, Dimitrios, V
    SENSORS, 2022, 22 (22)
  • [25] In-Network Data Processing in Software-Defined IoT with a Programmable Data Plane
    Kim, Ki-Wook
    Min, Sung-Gi
    Han, Youn-Hee
    MOBILE INFORMATION SYSTEMS, 2018, 2018
  • [26] Enabling Secure Cross-Modal Retrieval Over Encrypted Heterogeneous IoT Databases With Collective Matrix Factorization
    Guo, Cheng
    Jia, Jing
    Jie, Yingmo
    Liu, Charles Zhechao
    Choo, Kim-Kwang Raymond
    IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (04): : 3104 - 3113
  • [27] A general confinement co-assembly strategy enabling cross-dimensional supraspheres for boosting electrochemical performance
    Han, Dandan
    Zhou, Qian
    Xia, Yan
    Huang, Dongting
    Qin, Jieqiong
    Wang, Lixia
    Wang, Xiaopeng
    Zheng, Xianfu
    Wu, Dan
    CARBON, 2022, 200 : 296 - 306
  • [28] Enabling Privacy-Assured Fog-Based Data Aggregation in E-Healthcare Systems
    Guo, Cheng
    Tian, Pengxu
    Choo, Kim-Kwang Raymond
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (03) : 1948 - 1957
  • [29] EMM: Extended matching market based scheduling for big data platform hadoop
    Singh, Balraj
    Verma, Harsh K.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34823 - 34847
  • [30] Gapprox: using Gallup approach for approximation in Big Data processing
    Ahmadvand, Hossein
    Goudarzi, Maziar
    Foroutan, Fouzhan
    JOURNAL OF BIG DATA, 2019, 6 (01)