Exploring OpenSHMEM Model to Program GPU-based Extreme-Scale Systems

被引:8
作者
Potluri, Sreeram [1 ]
Rossetti, Davide [1 ]
Becker, Donald [1 ]
Poole, Duncan [1 ]
Venkata, Manjunath Gorentla [2 ]
Hernandez, Oscar [2 ]
Shamis, Pavel [2 ]
Lopez, M. Graham [2 ]
Baker, Mathew [2 ]
Poole, Wendy [3 ]
机构
[1] NVIDIA Corp, Santa Clara, CA USA
[2] Oak Ridge Natl Lab, ESSC, Oak Ridge, TN USA
[3] Open Source Software Solut, Knoxville, TN USA
来源
OPENSHMEM AND RELATED TECHNOLOGIES: EXPERIENCES, IMPLEMENTATIONS, AND TECHNOLOGIES, OPENSHMEM 2015 | 2015年 / 9397卷
关键词
D O I
10.1007/978-3-319-26428-8_2
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Extreme-scale systems with compute accelerators such as Graphical Processing Unit (GPUs) have become popular for executing scientific applications. These systems are typically programmed using MPI and CUDA (for NVIDIA based GPUs). However, there are many drawbacks to the MPI+CUDA approach. The orchestration required between the compute and communication phases of the application execution, and the constraint that communication can only be initiated from serial portions on the Central Processing Unit (CPU) lead to scaling bottlenecks. To address these drawbacks, we explore the viability of using OpenSHMEM for programming these systems. In this paper, first, we make a case for supporting GPU-initiated communication, and suitability of the OpenSHMEM programming model. Second, we present NVSHMEM, a prototype implementation of the proposed programming approach, port Stencil and Transpose benchmarks which are representative of many scientific applications from MPI+CUDA model to OpenSHMEM, and evaluate the design and implementation of NVSHMEM. Finally, we provide a discussion on the opportunities and challenges of OpenSHMEMto program these systems, and propose extensions to OpenSHMEMto achieve the full potential of this programming approach.
引用
收藏
页码:18 / 35
页数:18
相关论文
共 18 条
[1]  
Aji A. M., 2012, 14 IEEE INT C HIGH P
[2]  
AVAGO, 2015, EXPR TECHN
[3]  
Bruggencate Monika, 2014, OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools. First Workshop, OpenSHMEM 2014. Proceedings: LNCS 8356, P178, DOI 10.1007/978-3-319-05215-1_13
[4]  
Cunningham Dave, 2011, P 2011 ACM SIGPLAN X, DOI [10.1145/2212736.2212744, DOI 10.1145/2212736.2212744]
[5]  
Dinan James., 2014, Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, p10:1, DOI [DOI 10.1145/2676870.2676872, 10.1145/2676870.2676872]
[6]  
Gropp W, 2006, LECT NOTES COMPUT SC, V4192, P12
[7]  
Miyoshi T., 2012, P 5 ANN WORKSH GEN P, P20
[8]  
MPI Forum, 2012, TECHNICAL REPORT
[9]  
MVAPICH, 2015, MPI INFINBAND 10GIGE
[10]  
NVIDIA, 2015, GPUDIRECT