Optimizing Cross-Platform Data Movement

被引:5
作者
Kruse, Sebastian [2 ]
Kaoudi, Zoi [1 ]
Quiane-Ruiz, Jorge-Arnulfo [1 ]
Chawla, Sanjay [1 ]
Naumann, Felix [2 ]
Contreras-Rojas, Bertty [1 ]
机构
[1] HBKU, Qatar Comp Res Inst, Ar Rayyan, Qatar
[2] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany
来源
2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019) | 2019年
关键词
D O I
10.1109/ICDE.2019.00162
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data analytics are moving beyond the limits of a single data processing platform. A cross-platform query optimizer is necessary to enable applications to run their tasks over multiple platforms efficiently and in a platform-agnostic manner. For the optimizer to be effective, it must consider data movement costs across different data processing platforms. In this paper, we present the graph-based data movement strategy used by RHEEM, our open-source cross-platform system. In particular, we (i) model the data movement problem as a new graph problem, which we prove to be NP-hard, and (ii) propose a novel graph exploration algorithm, which allows RHEEM to discover multiple hidden opportunities for cross-platform data processing.
引用
收藏
页码:1642 / 1645
页数:4
相关论文
共 16 条
[1]  
Agrawal D., 2016, EDBT, P479
[2]   RHEEM: Enabling Cross-Platform Data Processing [J].
Agrawal, Divy ;
Chawla, Sanjay ;
Contreras-Rojas, Bertty ;
Elmagarmid, Ahmed ;
Idris, Yasser ;
Kaoudi, Zoi ;
Kruse, Sebastian ;
Lucas, Ji ;
Mansour, Essam ;
Ouzzani, Mourad ;
Papotti, Paolo ;
Quiane-Ruiz, Jorge-Arnulfo ;
Tang, Nan ;
Thirumuruganathan, Saravanan ;
Troudi, Anis .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (11) :1414-1427
[3]   Rheem: Enabling Multi-Platform Task Execution [J].
Agrawal, Divy ;
Ba, Lamine ;
Berti-Equille, Laure ;
Chawla, Sanjay ;
Elmagarmid, Ahmed ;
Hammady, Hossam ;
Idris, Yasser ;
Kaoudi, Zoi ;
Khayyat, Zuhair ;
Kruse, Sebastian ;
Ouzzani, Mourad ;
Papotti, Paolo ;
Quiane-Ruiz, Jorge-Arnulfo ;
Tang, Nan ;
Zaki, Mohammed J. .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :2069-2072
[4]   Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources [J].
Begoli, Edmon ;
Camacho-Rodriguez, Jesus ;
Hyde, Julian ;
Mior, Michael J. ;
Lemire, Daniel .
SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, :221-230
[5]   A greedy approximation algorithm for the group Steiner problem [J].
Chekuri, C ;
Even, G ;
Kortsarz, G .
DISCRETE APPLIED MATHEMATICS, 2006, 154 (01) :15-34
[6]  
Doka K, 2016, 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), P194, DOI 10.1109/BigData.2016.7840605
[7]  
Elmore A, 2015, PROC VLDB ENDOW, V8, P1909
[8]  
Hand, 2015, EUROSYS, P1
[9]   PipeGen: Data Pipe Generator for Hybrid Analytics [J].
Haynes, Brandon ;
Cheung, Alvin ;
Balazinska, Magdalena .
PROCEEDINGS OF THE SEVENTH ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC 2016), 2016, :470-483
[10]  
Kaoudi Z., 2018, ICDE TUTORIAL