AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs

被引:35
|
作者
Matsumura, Kazuaki [1 ,4 ]
Zohouri, Hamid Reza [2 ,4 ]
Wahib, Mohamed [3 ]
Endo, Toshio [4 ]
Matsuoka, Satoshi [5 ,6 ]
机构
[1] Barcelona Supercomp Ctr, Barcelona, Spain
[2] Edgecortix Inc, Tokyo, Japan
[3] AIST, Tokyo, Japan
[4] Tokyo Inst Technol, Tokyo, Japan
[5] RIKEN CCS, Kobe, Hyogo, Japan
[6] RWBC OIL, Tokyo, Japan
来源
CGO'20: PROCEEDINGS OF THE18TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION | 2020年
关键词
Stencil Computation; GPU; Automatic Code Generation; Temporal Blocking;
D O I
10.1145/3368826.3377904
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Stencil computation is one of the most widely-used compute patterns in high performance computing applications. Spatial and temporal blocking have been proposed to overcome the memory-bound nature of this type of computation by moving memory pressure from external memory to on-chip memory on GPUs. However, correctly implementing those optimizations while considering the complexity of the architecture and memory hierarchy of GPUs to achieve high performance is difficult. We propose AN5D, an automated stencil framework which is capable of automatically transforming and optimizing stencil patterns in a given C source code, and generating corresponding CUDA code. Parameter tuning in our framework is guided by our performance model. Our novel optimization strategy reduces shared memory and register pressure in comparison to existing implementations, allowing performance scaling up to a temporal blocking degree of 10. We achieve the highest performance reported so far for all evaluated stencil benchmarks on the state-of-the-art Tesla V100 GPU.
引用
收藏
页码:199 / 211
页数:13
相关论文
共 2 条
  • [1] Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL
    Zohouri, Hamid Reza
    Podobas, Artur
    Matsuoka, Satoshi
    PROCEEDINGS OF THE 2018 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'18), 2018, : 153 - 162
  • [2] Exploiting Scratchpad Memory for Deep Temporal Blocking A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt)
    Zhang, Lingqi
    Wahib, Mohamed
    Chen, Peng
    Meng, Jintao
    Wang, Xiao
    Endo, Toshio
    Matsuoka, Satoshi
    15TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPU, GPGPU 2023, 2023, : 34 - 35