DMR API: Improving cluster productivity by turning applications into malleable

被引:11
作者
Iserte, Sergio [1 ]
Mayo, Rafael [1 ]
Quintana-Orti, Enrique S. [1 ]
Beltran, Vicenc [2 ]
Pena, Antonio J. [2 ]
机构
[1] UJI, Castellon de La Plana, Spain
[2] BSC, Barcelona, Spain
基金
欧盟地平线“2020”;
关键词
MPI malleability; Job reconfiguration; Dynamic reallocation; Smart resource utilization; Adaptive workload;
D O I
10.1016/j.parco.2018.07.006
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Adaptive workloads can change on-the-fly the configuration of their jobs, in terms of number of processes. To carry out these job reconfigurations, we have designed a methodology which enables a job to communicate with the resource manager and, through the runtime. to change its number of MPI ranks. The collaboration between both the workload manager-aware of the queue of jobs and the resources allocation-and the parallel runtime-able to transparently handle the processes and the program data-is crucial for our throughput-aware malleability methodology. Hence, when a job triggers a reconfiguration, the resource manager will check the cluster status and return the appropriate action: i) expand, if there are spare resources; ii) shrink, if queued jobs can be initiated; or iii) none, if no change can improve the global productivity. In this paper, we describe the internals of our framework and demonstrate how it reduces the global workload completion time along with providing a more efficient usage of the underlying resources. For this purpose, we present a thorough study of the adaptive workloads processing by showing the detailed behavior of our framework in representative experiments. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:54 / 66
页数:13
相关论文
共 18 条
[1]   Post-failure recovery of MPI communication capability: Design and rationale [J].
Bland, Wesley ;
Bouteiller, Aurelien ;
Herault, Thomas ;
Bosilca, George ;
Dongarra, Jack .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2013, 27 (03) :244-254
[2]   Malleable iterative MPI applications [J].
El Maghraoui, K. ;
Desell, Travis J. ;
Szymanski, Boleslaw K. ;
Varela, Carlos A. .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2009, 21 (03) :393-413
[3]  
Feitelson D. G., 1996, Job Scheduling Strategies for Parallel Processing. IPPS '96 Workshop Proceedings, P1, DOI 10.1007/BFb0022284
[4]  
Gupta A., 2014, Proceedings of the 21st International Conference on High Performance Computing (HiPC 14), P1
[5]  
Iserte S., 2017, 10 INT WORKSH P2S2 B
[6]  
Lemarinier P., 2016, P 23 EUR MPI US GROU
[7]  
Martin G., 2013, EURO PAR PARALLEL PR
[8]   Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration [J].
Martin, Gonzalo ;
Singh, David E. ;
Marinescu, Maria-Cristina ;
Carretero, Jesus .
PARALLEL COMPUTING, 2015, 46 :60-77
[9]  
Moody A., 2010, ACM IEEE INT C HIGH
[10]  
Padhye J., 1996, Job Scheduling Strategies for Parallel Processing. IPPS '96 Workshop Proceedings, P224, DOI 10.1007/BFb0022296