Porting and scaling OpenACC applications on massively-parallel, GPU-accelerated supercomputers

被引:0
|
作者
A. Hart
R. Ansaloni
A. Gray
机构
[1] Cray Exascale Research Initiative Europe,EPCC
[2] Cray Italy S.r.l.,undefined
[3] The University of Edinburgh,undefined
关键词
Graphic Processing Unit; European Physical Journal Special Topic; Parallel Loop; Strong Scaling; Hybrid Node;
D O I
暂无
中图分类号
学科分类号
摘要
An increasing number of massively-parallel supercomputers are based on heterogeneous node architectures combining traditional, powerful multicore CPUs with energy-efficient GPU accelerators. Such systems offer high computational performance with modest power consumption. As the industry trend of closer integration of CPU and GPU silicon continues, these architectures are a possible template for future exascale systems. Given the longevity of large-scale parallel HPC applications, it is important that there is a mechanism for easy migration to such hybrid systems. The OpenACC programming model offers a directive-based method for porting existing codes to run on hybrid architectures. In this paper, we describe our experiences in porting the Himeno benchmark to run on the Cray XK6 hybrid supercomputer. We describe the OpenACC programming model and the changes needed in the code, both to port the functionality and to tune the performance. Despite the additional PCIe-related overheads when transferring data from one GPU to another over the Cray Gemini interconnect, we find the application gives very good performance and scales well. Of particular interest is the facility to launch OpenACC kernels and data transfers asynchronously, which speeds the Himeno benchmark by 5%–10%. Comparing performance with an optimised code on a similar CPU-based system (using 32 threads per node), we find the OpenACC GPU version to be just under twice the speed in a node-for-node comparison. This speed-up is limited by the computational simplicity of the Himeno benchmark and is likely to be greater for more complicated applications.
引用
收藏
页码:5 / 16
页数:11
相关论文
共 50 条
  • [11] High-Performance Evaluation of the Interpolations and Anterpolations in the GPU-Accelerated Massively Parallel MLFMA
    He, Wei-Jia
    Yang, Zeng
    Huang, Xiao-Wei
    Wang, Wu
    Yang, Ming-Lin
    Sheng, Xin-Qing
    IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2023, 71 (07) : 6231 - 6236
  • [12] Implementation of Relativistic Coupled Cluster Theory for Massively Parallel GPU-Accelerated Computing Architectures
    Pototschnig, Johann, V
    Papadopoulos, Anastasios
    Lyakh, Dmitry, I
    Repisky, Michal
    Halbert, Loic
    Gomes, Andre Severo Pereira
    Jensen, Hans Jorgen Aa
    Visscher, Lucas
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2021, 17 (09) : 5509 - 5529
  • [13] Effective Sampling-Driven Performance Tools for GPU-Accelerated Supercomputers
    Chabbi, Milind
    Murthy, Karthik
    Fagan, Michael
    Mellor-Crummey, John
    2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,
  • [14] Massively-parallel Lagrangian particle code and applications
    Yuan, Shaohua
    Aguilar, Mario Zepeda
    Naitlho, Nizar
    Samulyak, Roman
    MECHANICS RESEARCH COMMUNICATIONS, 2023, 129
  • [15] GPU-accelerated parallel optimization for sparse regularization
    Wang, Xingran
    Liu, Tianyi
    Minh Trinh-Hoang
    Pesavento, Marius
    2020 IEEE 11TH SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING WORKSHOP (SAM), 2020,
  • [16] MASSIVELY-PARALLEL ELECTROMAGNETIC SIMULATION FOR PHOTOLITHOGRAPHIC APPLICATIONS
    WONG, AK
    GUERRIERI, R
    NEUREUTHER, AR
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 1995, 14 (10) : 1231 - 1240
  • [17] GPU-accelerated parallel algorithms for linear rankSVM
    Jing Jin
    Xianggao Cai
    Guoming Lai
    Xiaola Lin
    The Journal of Supercomputing, 2015, 71 : 4141 - 4171
  • [18] GPU-accelerated parallel algorithms for linear rankSVM
    Jin, Jing
    Cai, Xianggao
    Lai, Guoming
    Lin, Xiaola
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (11): : 4141 - 4171
  • [19] Massively Parallel GPU-Accelerated String Method for Fast and Accurate Prediction of Molecular Diffusivity in Nanoporous Materials
    Zhou, Musen
    Wu, Jianzhong
    ACS APPLIED NANO MATERIALS, 2021, 4 (05) : 5394 - 5403
  • [20] GronOR: Massively parallel and GPU-accelerated non-orthogonal configuration interaction for large molecular systems
    Straatsma, T. P.
    Broer, R.
    Faraji, S.
    Havenith, R. W. A.
    Suarez, L. E. Aguilar
    Kathir, R. K.
    Wibowo, M.
    de Graaf, C.
    JOURNAL OF CHEMICAL PHYSICS, 2020, 152 (06):