Exploiting Geometric Partitioning in Task Mapping for Parallel Computers

被引:28
作者
Deveci, Mehmet [1 ]
Rajamanickam, Sivasankaran [2 ]
Leung, Vitus J. [2 ]
Pedretti, Kevin [2 ]
Olivier, Stephen L. [2 ]
Bunde, David P. [3 ]
Catalyfirek, Emit V. [1 ]
Devine, Karen [2 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] Ohio State Univ, Albuquerque 1611, NM USA
[3] Knox Coll, Galesburg 61401, IL USA
来源
2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM | 2014年
关键词
D O I
10.1109/IPDPS.2014.15
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present a new method for mapping applications' MPI tasks to cores of a parallel computer such that communication and execution time are reduced. We consider the case of sparse node allocation within a parallel machine, where the nodes assigned to a job are not necessarily located within a contiguous block nor within close proximity to each other in the network. The goal is to assign tasks to cores so that interdependent tasks are performed by "nearby" cores, thus lowering the distance messages must travel, the amount of congestion in the network, and the overall cost of communication. Our new method applies a geometric partitioning algorithm to both the tasks and the processors, and assigns task parts to the corresponding processor parts. We show that, for the structured finite difference mini-app MiniGhost, our mapping method reduced execution time 34% on average on 65,536 cores of a Cray XE6. In a molecular dynamics mini-app, MiniMD, our mapping method reduced communication time by 26% on average on 6144 cores. We also compare our mapping with graph-based mappings from the LibTopoMap library and show that our mappings reduced the communication time on average by 15% in MiniGhost and 10% in MiniMD.
引用
收藏
页数:10
相关论文
共 28 条
[1]  
Aktulga HM, 2012, LECT NOTES COMPUT SC, V7484, P830, DOI 10.1007/978-3-642-32820-6_82
[2]  
Albing C., 2011, P CRAY US GROUP CUG
[3]  
Almasi G., 2004, Proc 2004 ACM/IEEE Conf Supercomputing, P57, DOI DOI 10.1109/SC.2004.63
[4]  
[Anonymous], 2011, ICS 11, DOI [10.1145/1995896.1995909, DOI 10.1145/1995896.1995909]
[5]  
[Anonymous], 2009, TECH REP
[6]  
[Anonymous], 2006, P 2006 ACM IEEE C SU
[7]   REDUCING THE BULK IN THE BULK SYNCHRONOUS PARALLEL MODEL [J].
Barrett, R. F. ;
Vaughan, C. T. ;
Hammond, S. D. ;
Roweth, D. .
PARALLEL PROCESSING LETTERS, 2013, 23 (04)
[8]  
Barrett R. F., 2012, TECH REP
[9]  
BERGER MJ, 1987, IEEE T COMPUT, V36, P570, DOI 10.1109/TC.1987.1676942
[10]  
Bhatele A., 2010, P INT C HIGH PERF CO