An Initial Assessment of NVSHMEM for High Performance Computing

被引:9
作者
Hsu, Chung-Hsing [1 ]
Imam, Neena [1 ]
Langer, Akhil [2 ]
Potluri, Sreeram [2 ]
Newburn, Chris J. [2 ]
机构
[1] Oak Ridge Natl Lab, Comp & Computat Sci, Oak Ridge, TN 37830 USA
[2] NVID1A Corp, Compute Software, Santa Clara, CA USA
来源
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020) | 2020年
关键词
High Performance Computing (HPC); CUDA; OpenSHMEM; scalability;
D O I
10.1109/IPDPSW50202.2020.00104
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
High Performance Computing has been a driving force behind important tasks such as scientific discovery and deep learning. It tends to achieve performance through greater concurrency and heterogeneity, where the underlying complexity of richer topologies is managed through software abstraction. In this paper, we present our initial assessment of NVSHMEM, an experimental programming library that supports the Partitioned Global Address Space programming model for NVIDIA GPU clusters. NVSHMEM offers several concrete advantages. One is that it reduces overheads and software complexity by allowing communication and computation to be interleaved vs. separating them into different phases. Another is that it implements the OpenSHMEM specification to provide efficient finegrained one-sided communication, streamlining away overheads due to tag matching, wildcards, and unexpected messages which have compounding effect with increasing concurrency. It also offers ease of use by abstracting away low-level configuration operations that are required to enable low-overhead communication and direct loads and stores across processes. We evaluated NVSHMEM in terms of usability, functionality, and scalability by running two math kernels, matrix multiplication and Jacobi solver, on the 27,648-GPU Summit supercomputer. Our exercise of NVSHMEM at scale contributed to making NVSHMEM more robust and preparing it for production release.
引用
收藏
页码:617 / 626
页数:10
相关论文
共 5 条
[1]  
CULLER DE, 1993, SUPERCOMP PROC, P262
[2]   Partitioned Global Address Space Languages [J].
De Wael, Mattias ;
Marr, Stefan ;
De Fraine, Bruno ;
Van Cutsem, Tom ;
De Meuter, Wolfgang .
ACM COMPUTING SURVEYS, 2015, 47 (04)
[3]  
Potluri S, 2017, INT C HIGH PERFORM, P253, DOI 10.1109/HiPC.2017.00037
[4]   Exploring OpenSHMEM Model to Program GPU-based Extreme-Scale Systems [J].
Potluri, Sreeram ;
Rossetti, Davide ;
Becker, Donald ;
Poole, Duncan ;
Venkata, Manjunath Gorentla ;
Hernandez, Oscar ;
Shamis, Pavel ;
Lopez, M. Graham ;
Baker, Mathew ;
Poole, Wendy .
OPENSHMEM AND RELATED TECHNOLOGIES: EXPERIENCES, IMPLEMENTATIONS, AND TECHNOLOGIES, OPENSHMEM 2015, 2015, 9397 :18-35
[5]  
Vazhkudai SS, 2018, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18)