Locality-Aware Parallel Process Mapping for Multi-Core HPC Systems

被引:14
作者
Hursey, Joshua [1 ]
Squyres, Jeffrey M. [1 ]
Dontje, Terry [1 ]
机构
[1] Oak Ridge Natl Lab, Oak Ridge, TN 37831 USA
来源
2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) | 2011年
关键词
Process Affinity; Locality; NUMA; MPI; Resource Management;
D O I
10.1109/CLUSTER.2011.59
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
High Performance Computing (HPC) systems are composed of servers containing an ever-increasing number of cores. With such high processor core counts, non-uniform memory access (NUMA) architectures are almost universally used to reduce inter-processor and memory communication bottlenecks by distributing processors and memory throughout a server-internal networking topology. Application studies have shown that the tuning of processes placement in a server's NUMA networking topology to the application can have a dramatic impact on performance. The performance implications are magnified when running a parallel job across multiple server nodes, especially with large scale HPC applications. This paper presents the Locality-Aware Mapping Algorithm (LAMA) for distributing the individual processes of a parallel application across processing resources in an HPC system, paying particular attention to the internal server NUMA topologies. The algorithm is able to support both homogeneous and heterogeneous hardware systems, and dynamically adapts to the available hardware and user-specified process layout at run-time. As implemented in Open MPI, the LAMA provides 362,880 mapping permutations and is able to naturally scale out to additional hardware resources as they become available in future architectures.
引用
收藏
页码:527 / 531
页数:5
相关论文
共 12 条
  • [1] Almási G, 2004, LECT NOTES COMPUT SC, V3149, P833
  • [2] [Anonymous], MPICH2
  • [3] Argonne National Laboratory, US HYDR PROC MAN
  • [4] hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications
    Broquedis, Francois
    Clet-Ortega, Jerome
    Moreaud, Stephanie
    Furmento, Nathalie
    Goglin, Brice
    Mercier, Guillaume
    Thibault, Samuel
    Namyst, Raymond
    [J]. PROCEEDINGS OF THE 18TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2010, : 180 - 186
  • [5] Large-scale gyrokinetic particle simulation of microturbulence in magnetically confined fusion plasmas
    Ethier, S.
    Tang, W. M.
    Walkup, R.
    Oliker, L.
    [J]. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2008, 52 (1-2) : 105 - 115
  • [6] Gabriel E, 2004, LECT NOTES COMPUT SC, V3241, P97
  • [7] IEEE, 1993, SUPERCOMP PROC, P878
  • [8] Jeannot E, 2010, LECT NOTES COMPUT SC, V6272, P199, DOI 10.1007/978-3-642-15291-7_20
  • [9] Karo M., 2006, CRAY USERS GROUP
  • [10] Sosa C., 2009, IBM System Blue Gene/P Application Developm ent