Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf

被引:119
作者
Reyes-Ortiz, Jorge L. [1 ]
Oneto, Luca [2 ]
Anguita, Davide [1 ]
机构
[1] Univ Genoa, DIBRIS, I-16145 Genoa, Italy
[2] Univ Genoa, DITEN, I-16145 Genoa, Italy
来源
INNS CONFERENCE ON BIG DATA 2015 PROGRAM | 2015年 / 53卷
关键词
Big Data; Supervised Learning; Spark; Hadoop; MPI; OpenMP; Beowulf; Cloud; Parallel Computing; OPTIMIZATION; GRADIENT;
D O I
10.1016/j.procs.2015.07.286
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
One of the biggest challenges of the current big data landscape is our inability to process vast amounts of information in a reasonable time. In this work, we explore and compare two distributed computing frameworks implemented on commodity cluster architectures: MPI/OpenMP on Beowulf that is high-performance oriented and exploits multi-machine/multicore infrastructures, and Apache Spark on Hadoop which targets iterative algorithms through in-memory computing. We use the Google Cloud Platform service to create virtual machine clusters, run the frameworks, and evaluate two supervised machine learning algorithms: KNN and Pegasos SVM. Results obtained from experiments with a particle physics data set show MPI/OpenMP outperforms Spark by more than one order of magnitude in terms of processing speed and provides more consistent performance. However, Spark shows better data management infrastructure and the possibility of dealing with other aspects such as node failure and data replication.
引用
收藏
页码:121 / 130
页数:10
相关论文
共 46 条
[1]  
Agarwal A, 2014, J MACH LEARN RES, V15, P1111
[2]  
Agrawal D., 2011, P 14 INT C EXT DAT T, P530, DOI DOI 10.1145/1951365.1951432
[3]  
Aha D, 1997, LAZY LEARNING
[4]   In-Sample and Out-of-Sample Model Selection and Error Estimation for Support Vector Machines [J].
Anguita, Davide ;
Ghio, Alessandro ;
Oneto, Luca ;
Ridella, Sandro .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (09) :1390-1406
[5]  
[Anonymous], 2004, KERNEL METHODS PATTE
[6]  
[Anonymous], 1998, SIGPLAN FORTRAN FORU, DOI [DOI 10.1145/289918.289920, 10.1145/289918.289920]
[7]  
[Anonymous], 2010, USENIX WORKSH HOT TO
[8]   Searching for exotic particles in high-energy physics with deep learning [J].
Baldi, P. ;
Sadowski, P. ;
Whiteson, D. .
NATURE COMMUNICATIONS, 2014, 5
[9]  
Basumallik Ayon., 2007, Parallel and Distributed Processing Symposium, P1, DOI DOI 10.1109/IPDPS.2007.370397
[10]  
Beygelzimer A., 2006, ICML, DOI 10.1145/1143844.1143857