Porting and scaling OpenACC applications on massively-parallel, GPU-accelerated supercomputers

被引:0
|
作者
A. Hart
R. Ansaloni
A. Gray
机构
[1] Cray Exascale Research Initiative Europe,EPCC
[2] Cray Italy S.r.l.,undefined
[3] The University of Edinburgh,undefined
关键词
Graphic Processing Unit; European Physical Journal Special Topic; Parallel Loop; Strong Scaling; Hybrid Node;
D O I
暂无
中图分类号
学科分类号
摘要
An increasing number of massively-parallel supercomputers are based on heterogeneous node architectures combining traditional, powerful multicore CPUs with energy-efficient GPU accelerators. Such systems offer high computational performance with modest power consumption. As the industry trend of closer integration of CPU and GPU silicon continues, these architectures are a possible template for future exascale systems. Given the longevity of large-scale parallel HPC applications, it is important that there is a mechanism for easy migration to such hybrid systems. The OpenACC programming model offers a directive-based method for porting existing codes to run on hybrid architectures. In this paper, we describe our experiences in porting the Himeno benchmark to run on the Cray XK6 hybrid supercomputer. We describe the OpenACC programming model and the changes needed in the code, both to port the functionality and to tune the performance. Despite the additional PCIe-related overheads when transferring data from one GPU to another over the Cray Gemini interconnect, we find the application gives very good performance and scales well. Of particular interest is the facility to launch OpenACC kernels and data transfers asynchronously, which speeds the Himeno benchmark by 5%–10%. Comparing performance with an optimised code on a similar CPU-based system (using 32 threads per node), we find the OpenACC GPU version to be just under twice the speed in a node-for-node comparison. This speed-up is limited by the computational simplicity of the Himeno benchmark and is likely to be greater for more complicated applications.
引用
收藏
页码:5 / 16
页数:11
相关论文
共 50 条
  • [21] USING MASSIVELY-PARALLEL SUPERCOMPUTERS TO MODEL STOCHASTIC SPATIAL PREDATOR-PREY SYSTEMS
    SMITH, M
    ECOLOGICAL MODELLING, 1991, 58 (1-4) : 347 - 367
  • [22] Reliability Estimations of Large Circuits in Massively-Parallel GPU-SPICE
    van Santen, Victor M.
    Amrouch, Hussam
    Henkel, Jorg
    2018 IEEE 24TH INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN (IOLTS 2018), 2018, : 143 - 146
  • [23] Software Cost Analysis of GPU-Accelerated Aeroacoustics Simulations in C plus plus with OpenACC
    Nicolini, Marco
    Miller, Julian
    Wienke, Sandra
    Schlottke-Lakemper, Michael
    Meinke, Matthias
    Mueller, Matthias S.
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS, 2016, 9945 : 524 - 543
  • [24] ADAPTIVE ROUTING FOR DYNAMIC APPLICATIONS IN MASSIVELY-PARALLEL ARCHITECTURES
    BOARI, M
    CORRADI, A
    STEFANELLI, C
    LEONARDI, L
    IEEE PARALLEL & DISTRIBUTED TECHNOLOGY, 1995, 3 (01): : 61 - 74
  • [25] GPU-accelerated string matching for database applications
    Evangelia A. Sitaridi
    Kenneth A. Ross
    The VLDB Journal, 2016, 25 : 719 - 740
  • [26] A Performance Model for GPU-Accelerated FDTD Applications
    Baumeister, Paul F.
    Hater, Thorsten
    Kraus, Jiri
    Pleiter, Dirk
    Wahl, Pierre
    2015 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2015, : 185 - 193
  • [27] Measurement and analysis of GPU-accelerated applications with HPCToolkit
    Zhou, Keren
    Adhianto, Laksono
    Anderson, Jonathon
    Cherian, Aaron
    Grubisic, Dejan
    Krentel, Mark
    Liu, Yumeng
    Meng, Xiaozhu
    Mellor-Crummey, John
    PARALLEL COMPUTING, 2021, 108
  • [28] GPU-Accelerated Parallel FDTD on Distributed Heterogeneous Platform
    Jiang, Ronglin
    Jiang, Shugang
    Zhang, Yu
    Xu, Ying
    Xu, Lei
    Zhang, Dandan
    INTERNATIONAL JOURNAL OF ANTENNAS AND PROPAGATION, 2014, 2014
  • [29] GPU-accelerated string matching for database applications
    Sitaridi, Evangelia A.
    Ross, Kenneth A.
    VLDB JOURNAL, 2016, 25 (05): : 719 - 740
  • [30] A Tool for Performance Analysis of GPU-Accelerated Applications
    Zhou, Keren
    Mellor-Crummey, John
    PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO '19), 2019, : 282 - 282