RHEEM: Enabling Cross-Platform Data Processing

被引:28
|
作者
Agrawal, Divy [2 ]
Chawla, Sanjay [1 ]
Contreras-Rojas, Bertty [1 ]
Elmagarmid, Ahmed [1 ]
Idris, Yasser [1 ]
Kaoudi, Zoi [1 ]
Kruse, Sebastian [3 ]
Lucas, Ji [1 ]
Mansour, Essam [1 ]
Ouzzani, Mourad [1 ]
Papotti, Paolo [1 ,4 ]
Quiane-Ruiz, Jorge-Arnulfo [1 ]
Tang, Nan [1 ]
Thirumuruganathan, Saravanan [1 ]
Troudi, Anis [1 ]
机构
[1] HBKU, Qatar Comp Res Inst, Doha, Qatar
[2] UCSB, Santa Barbara, CA 93106 USA
[3] Hasso Plattner Inst, Potsdam, Germany
[4] Eurecom, Biot, France
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 11卷 / 11期
关键词
EFFICIENT;
D O I
10.14778/3236187.3236195
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Solving business problems increasingly requires going beyond the limits of a single data processing platform (platform for short), such as Hadoop or a DBMS. As a result, organizations typically perform tedious and costly tasks to juggle their code and data across different platforms. Addressing this pain and achieving automatic cross-platform data processing is quite challenging: finding the most efficient platform for a given task requires quite good expertise for all the available platforms. We present RHEEM, a general-purpose cross-platform data processing system that decouples applications from the underlying platforms. It not only determines the best platform to run an incoming task, but also splits the task into subtasks and assigns each subtask to a specific platform to minimize the overall cost (e.g., runtime or monetary cost). It features (i) an interface to easily compose data analytic tasks; (ii) a novel cost-based optimizer able to find the most efficient platform in almost all cases; and (iii) an executor to efficiently orchestrate tasks over different platforms. As a result, it allows users to focus on the business logic of their applications rather than on the mechanics of how to compose and execute them. Using different real-world applications with RHEEM, we demonstrate how cross-platform data processing can accelerate performance by more than one order of magnitude compared to single-platform data processing.
引用
收藏
页码:1414 / 1427
页数:14
相关论文
共 50 条
  • [1] Reconsideration of In-Silico siRNA Design Based on Feature Selection: A Cross-Platform Data Integration Perspective
    Liu, Qi
    Zhou, Han
    Cui, Juan
    Cao, Zhiwei
    Xu, Ying
    PLOS ONE, 2012, 7 (05):
  • [2] Mining the Archives: A Cross-Platform Analysis of Gene Expression Profiles in Archival Formalin-Fixed Paraffin-Embedded Tissues
    Webster, A. Francina
    Zumbo, Paul
    Fostel, Jennifer
    Gandara, Jorge
    Hester, Susan D.
    Recio, Leslie
    Williams, Andrew
    Wood, Charles E.
    Yauk, Carole L.
    Mason, Christopher E.
    TOXICOLOGICAL SCIENCES, 2015, 148 (02) : 460 - 472
  • [3] Sky-NN: Enabling Efficient Neural Network Data Processing with Skyrmion Racetrack Memory
    Liaw, Yong-Cheng
    Chen, Shuo-Han
    Chang, Yuan-Hao
    Liang, Yu-Pei
    2023 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, ISLPED, 2023,
  • [4] Nearest data processing in GPU
    Bitalebi, Hossein
    Safaei, Farshad
    Ebrahimi, Masoumeh
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2024, 44
  • [5] Enhancing Data Privacy: A Comprehensive Survey of Privacy-Enabling Technologies
    Razi, Qaiser
    Piyush, Raja
    Chakrabarti, Arjab
    Singh, Anushka
    Hassija, Vikas
    Chalapathi, G. S. S.
    IEEE ACCESS, 2025, 13 : 40354 - 40385
  • [6] Enabling online/offline remote data auditing for secure cloud storage
    Gan, Qingqing
    Wang, Xiaoming
    Li, Jianwei
    Yan, Jiajun
    Li, Suyu
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2021, 24 (04): : 3027 - 3041
  • [7] Scalable Processing of Massive Uncertain Graph Data: A Simultaneous Processing Approach
    Zou, Zhaonian
    Li, Faming
    Li, Jianzhong
    Li, Yingshu
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 183 - 186
  • [8] Enabling Privacy-Preserving Data Sharing with Bilateral Access Control for Cloud
    Wu, Tong
    Ma, Xiaochen
    Yan, Hailun
    ELECTRONICS, 2023, 12 (23)
  • [9] MHDP: An Efficient Data Lake Platform for Medical Multi-source Heterogeneous Data
    Ren, Peng
    Li, Shuaibo
    Hou, Wei
    Zheng, Wenkui
    Li, Zhen
    Cui, Qin
    Chang, Wang
    Li, Xin
    Zeng, Chun
    Sheng, Ming
    Zhang, Yong
    WEB INFORMATION SYSTEMS AND APPLICATIONS (WISA 2021), 2021, 12999 : 727 - 738
  • [10] Big Data Processing at the Edge with Data Skew Aware Resource Allocation
    Ahmadvand, Hossein
    Dargahi, Tooska
    Foroutan, Fouzhan
    Okorie, Princewill
    Esposito, Flavio
    2021 IEEE CONFERENCE ON NETWORK FUNCTION VIRTUALIZATION AND SOFTWARE DEFINED NETWORKS (IEEE NFV-SDN), 2021, : 81 - 86