Performance Characterization of Spark Workloads on Shared NUMA Systems

被引:2
作者
Baig, Shuja-ur-Rehman [1 ,2 ,3 ]
Amaral, Marcelo [1 ,2 ]
Polo, Jorda [2 ]
Carrera, David [1 ,2 ]
机构
[1] Univ Politecn Catalunya UPC, Barcelona, Spain
[2] Barcelona Supercomp Ctr BSC, Barcelona, Spain
[3] Univ Punjab PU, Lahore, Pakistan
来源
2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2018) | 2018年
基金
欧洲研究理事会;
关键词
Performance; Modeling; Characterization; Memory; NUMA; Spark; Benchmark; AWARE DATA;
D O I
10.1109/BigDataService.2018.00015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As the adoption of Big Data technologies becomes the norm in an increasing number of scenarios, there is also a growing need to optimize them for modern processors. Spark has gained momentum over the last few years among companies looking for high performance solutions that can scale out across different cluster sizes. At the same time, modern processors can be connected to large amounts of physical memory, in the range of up to few terabytes. This opens an enormous range of opportunities for runtimes and applications that aim to improve their performance by leveraging low latencies and high bandwidth provided by RAM. The result is that there are several examples today of applications that have started pushing the in-memory computing paradigm to accelerate tasks. To deliver such a large physical memory capacity, hardware vendors have leveraged Non-Uniform Memory Architectures (NUMA). This paper explores how Spark-based workloads are impacted by the effects of NUMA-placement decisions, how different Spark configurations result in changes in delivered performance, how the characteristics of the applications can be used to predict workload collocation conflicts, and how to improve performance by collocating workloads in scale-up nodes. We explore several workloads run on top of the IBM Power8 processor, and provide manual strategies that can leverage performance improvements up to 40% on Spark workloads when using smart processor-pinning and workload collocation strategies.
引用
收藏
页码:41 / 48
页数:8
相关论文
共 18 条
[1]  
Ahn J., 2012, HOTCLOUD
[2]  
[Anonymous], 2011, USENIXATC 11
[3]  
Awan A. J., 2016, ARXIV160408484
[4]   Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters [J].
Awan, Ahsan Javed ;
Brorsson, Mats ;
Vlassov, Vladimir ;
Ayguade, Eduard .
2016 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES (BDCAT), 2016, :237-246
[5]   Contention-Aware Scheduling on Multicore Systems [J].
Blagodurov, Sergey ;
Zhuravlev, Sergey ;
Fedorova, Alexandra .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2010, 28 (04)
[6]  
Drebes A., 2016, P 21 ACM SIGPLAN S P, P44
[7]   Scalable Task Parallelism for NUMA: A Uniform Abstraction for Coordinated Scheduling and Memory Management [J].
Drebes, Andi ;
Pop, Antoniu ;
Heydemann, Karine ;
Cohen, Albert ;
Drach, Nathalie .
2016 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT), 2016, :125-137
[8]  
Ewart T., 2015, P 6 INT WORKSH PERF, P1
[9]   NUMA (Non-Uniform Memory Access): An overview: NUMA becomes more common because memory controllers get close to execution units on microprocessors [J].
Lameter, C., 1600, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (11) :40-51
[10]  
LaRowe R. P. Jr., 1991, Operating Systems Review, V25, P137, DOI 10.1145/121133.121158