Static and Dynamic Big Data Partitioning on Apache Spark

被引:10
作者
Bertolucci, Massimiliano [2 ]
Carlini, Emanuele [1 ]
Dazzi, Patrizio [1 ]
Lulli, Alessandro [1 ,2 ]
Ricci, Laura [1 ,2 ]
机构
[1] CNR, Ist Sci & Tecnol Informaz, Pisa, Italy
[2] Univ Pisa, Dept Comp Sci, Pisa, Italy
来源
PARALLEL COMPUTING: ON THE ROAD TO EXASCALE | 2016年 / 27卷
关键词
BigData; Graph algorithms; Data partitioning; Apache Spark;
D O I
10.3233/978-1-61499-621-7-489
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many of today's large datasets are organized as a graph. Due to their size it is often infeasible to process these graphs using a single machine. Therefore, many software frameworks and tools have been proposed to process graph on top of distributed infrastructures. This software is often bundled with generic data decomposition strategies that are not optimised for specific algorithms. In this paper we study how a specific data partitioning strategy affects the performances of graph algorithms executing on Apache Spark. To this end, we implemented different graph algorithms and we compared their performances using a naive partitioning solution against more elaborate strategies, both static and dynamic.
引用
收藏
页码:489 / 498
页数:10
相关论文
共 28 条
[1]  
Aldinucci M., 2007, SCALABLE COMPUTING P, V8
[2]   Performance study of Spindle, a web analytics query engine implemented in Spark. [J].
Amos, Brandon ;
Tompkins, David .
2014 IEEE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2014, :505-510
[3]  
[Anonymous], 2011, INT WORLD WIDE WEB C, DOI DOI 10.1145/1963405.1963491
[4]  
[Anonymous], 2012, P 18 ACM SIGKDD INT, DOI [10.1145/2339530.2339722, DOI 10.1145/2339530.2339722]
[5]  
[Anonymous], 1999, PAGERANK CITATION RA
[6]  
[Anonymous], 2013, PROC 25 INT C SCI ST
[7]   Response surface methodological approach for optimizing Removal of Ni (II) from aqueous solution using Palm Shell Activated Carbon [J].
Baker, Inas F. ;
Ibrahim, Shaliza ;
Daud, W. M. A. W. .
PROCEEDINGS OF THE 2010 INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND TECHNOLOGY (ICEST 2010), 2010, :178-182
[8]  
Carlini E, 2014, LECT NOTES COMPUT SC, V8805, P129, DOI 10.1007/978-3-319-14325-5_12
[9]  
Cole Murray, 1999, ALGORITHMIC SKELETON
[10]   PAL: Exploiting Java']Java annotations for parallelism [J].
Danelutto, Marco ;
Pasin, Marcelo ;
Vanneschi, Marco ;
Dazzi, Patrizio ;
Laforenza, Domenico ;
Presti, Luigi .
ACHIEVEMENTS IN EUROPEAN RESEARCH ON GRID SYSTEMS, 2008, :83-+