Analyzing MPI-3.0 Process-Level Shared Memory: A Case Study with Stencil Computations

被引:7
作者
Zhu, Xiaomin [1 ,2 ]
Zhang, Junchao [3 ]
Yoshii, Kazutomo [3 ]
Li, Shigang [4 ]
Zhang, Yunquan [4 ]
Balaji, Pavan [3 ]
机构
[1] Natl Supercomp Ctr, Shandong Comp Sci Ctr, Jinan, Peoples R China
[2] Shandong Prov Key Lab Comp Networks, Jinan, Peoples R China
[3] Argonne Natl Lab, Argonne, IL 60439 USA
[4] Chinese Acad Sci, Inst Comp Technol, Beijing 100864, Peoples R China
来源
2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING | 2015年
关键词
MPI-3.0; process shared memory; intranode communication; stencil; multicore; COMMUNICATION;
D O I
10.1109/CCGrid.2015.131
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The recently released MPI-3.0 standard introduced a process-level shared-memory interface which enables processes within the same node to have direct load/store access to each others' memory. Such an interface allows applications to declare data structures that are shared by multiple MPI processes on the node. In this paper, we study the capabilities and performance implications of using MPI-3.0 shared memory, in the context of a five-point stencil computation. Our analysis reveals that the use of MPI-3.0 shared memory has several unforeseen performance implications including disrupting certain compiler optimizations and incorrectly using suboptimal page sizes inside the OS. Based on this analysis, we propose several methodologies for working around these issues and improving communication performance by 40-85% compared to the current MPI-1.0 based approach.
引用
收藏
页码:1099 / 1106
页数:8
相关论文
共 13 条
[1]  
[Anonymous], MVAPICH MPI INFINIBA
[2]  
[Anonymous], 2012, MPI MESS PASS INT ST
[3]   Hybrid MPI: Efficient Message Passing for Multi-core Systems [J].
Friedley, Andrew ;
Bronevetsky, Greg ;
Hoefler, Torsten ;
Lumsdaine, Andrew .
2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,
[4]   Ownership Passing: Efficient Distributed Memory Programming on Multi-core Systems [J].
Friedley, Andrew ;
Hoefler, Torsten ;
Bronevetsky, Greg ;
Lumsdaine, Andrew ;
Ma, Ching-Chen .
ACM SIGPLAN NOTICES, 2013, 48 (08) :177-186
[5]   KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework [J].
Goglin, Brice ;
Moreaud, Stephanie .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (02) :176-188
[6]  
Henretty T, 2011, LECT NOTES COMPUT SC, V6601, P225, DOI 10.1007/978-3-642-19861-8_13
[7]  
Hoefler T., 2012, LEVERAGING MPIS ONE
[8]   MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory [J].
Hoefler, Torsten ;
Dinan, James ;
Buntinas, Darius ;
Balaji, Pavan ;
Barrett, Brian ;
Brightwell, Ron ;
Gropp, William ;
Kale, Vivek ;
Thakur, Rajeev .
COMPUTING, 2013, 95 (12) :1121-1136
[9]  
Huang W, 2008, IEEE INT C CL COMP, P107, DOI 10.1109/CLUSTR.2008.4663761
[10]  
JAEGER J., 2012, INT C HIGH PERFORMAN, P1