Parallelization of 2D MPDATA EULAG algorithm on hybrid architectures with GPU accelerators

被引:22
作者
Wyrzykowski, Roman [1 ]
Szustak, Lukasz [1 ]
Rojek, Krzysztof [1 ]
机构
[1] Czestochowa Tech Univ, Inst Comp & Informat Sci, Czestochowa, Poland
关键词
MPDATA advection algorithm; Stencil computation; GPU accelerators; Hybrid CPU-GPU architectures; Hierarchical decomposition; Autotuning; ADVECTION TRANSPORT ALGORITHM; PERFORMANCE; MULTI; IMPLEMENTATION; SIMULATION;
D O I
10.1016/j.parco.2014.04.009
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model developed for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The dynamic core of EULAG includes the multidimensional positive definite advection transport algorithm (MPDATA) and elliptic solver. In this work we investigate aspects of an optimal parallel version of the 2D MPDATA algorithm on modern hybrid architectures with GPU accelerators, where computations are distributed across both GPU and CPU components. Using the hybrid OpenMP-OpenCL model of parallel programming opens the way to harness the power of CPU-GPU platforms in a portable way. In order to better utilize features of such computing platforms, comprehensive adaptations of MPDATA computations to hybrid architectures are proposed. These adaptations are based on efficient strategies for memory and computing resource management, which allow us to ease memory and communication bounds, and better exploit the theoretical floating point efficiency of CPU-GPU platforms. The main contributions of the paper are: method for the decomposition of the 2D MPDATA algorithm as a tool to adapt MPDATA computations to hybrid architectures with GPU accelerators by minimizing communication and synchronization between CPU and GPU components at the cost of additional computations; method for the adaptation of 2D MPDATA computations to multicore CPU platforms, based on space and temporal blocking techniques; method for the adaptation of the 2D MPDATA algorithm to GPU architectures, based on a hierarchical decomposition strategy across data and computation domains, with support provided by the developed GPU task scheduler allowing for the flexible management of available resources; approach to the parametric optimization of 2D MPDATA computations on GPUs using the autotuning technique, which allows us to provide a portable implementation methodology across a variety of GPUs. Hybrid platforms tested in this study contain different numbers of CPUs and GPUs from solutions consisting of a single CPU and a single GPU to the most elaborate configuration containing two CPUs and two GPUs. Processors of different vendors are employed in these systems - both Intel and AMD CPUs, as well as GPUs from NVIDIA and AMD. For all the grid sizes and for all the tested platforms, the hybrid version with computations spread across CPU and GPU components allows us to achieve the highest performance. In particular, for the largest MPDATA grids used in our experiments, the speedups of the hybrid versions over GPU and CPU versions vary from 1.30 to 1.69, and from 1.95 to 2.25, respectively. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:425 / 447
页数:23
相关论文
共 50 条
  • [1] Systematic adaptation of stencil-based 3D MPDATA to GPU architectures
    Rojek, Krzysztof
    Wyrzykowski, Roman
    Kuczynski, Lukasz
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (09)
  • [2] Parallelization of 3D MPDATA Algorithm Using Many Graphics Processors
    Rojek, Krzysztof
    Wyrzykowski, Roman
    PARALLEL COMPUTING TECHNOLOGIES (PACT 2015), 2015, 9251 : 445 - 457
  • [3] Computing large 2D convolutions on GPU efficiently with the im2tensor algorithm
    Seznec, Mickael
    Gac, Nicolas
    Orieux, Francois
    Naik, Alvin Sashala
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2022, 19 (06) : 1035 - 1047
  • [4] Reducing Communication Overhead in Multi-GPU Hybrid Solver for 2D Laplace's Equation
    Czapinski, Michal
    Thompson, Chris
    Barnes, Stuart
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2014, 42 (06) : 1032 - 1047
  • [5] Assessment of GPU computational enhancement to a 2D flood model
    Kalyanapu, Alfred J.
    Shankar, Siddharth
    Pardyjak, Eric R.
    Judi, David R.
    Burian, Steven J.
    ENVIRONMENTAL MODELLING & SOFTWARE, 2011, 26 (08) : 1009 - 1016
  • [6] COMMUNICATION-MINIMIZING 2D CONVOLUTION IN GPU REGISTERS
    Iandola, Forrest N.
    Sheffield, David
    Anderson, Michael J.
    Phothilimthana, Phitchaya Mangpo
    Keutzer, Kurt
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 2116 - 2120
  • [7] On the Performance of a 2D Unstructured Computational Rheology Code on a GPU
    Pereira, Simao P.
    Vuik, Kees
    Pinho, Fernando T.
    Nobrega, Joao M.
    NOVEL TRENDS IN RHEOLOGY V, 2013, 1526 : 72 - 89
  • [8] Fast 3D transient electromagnetic forward modeling using BEDS-FDTD algorithm and GPU parallelization
    Liu, Shangbin
    Chen, Chengdong
    Sun, Huaifeng
    GEOPHYSICS, 2022, 87 (05) : E359 - E375
  • [9] Swendsen-Wang Multi-Cluster Algorithm for the 2D/3D Ising Model on Xeon Phi and GPU
    Wende, Florian
    Steinke, Thomas
    2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,
  • [10] A GPU Numerical Implementation of a 2D Simplified Wildfire Spreading Model
    San Martin, Daniel
    Torres, Claudio E.
    HIGH PERFORMANCE COMPUTING, CARLA 2023, 2024, 1887 : 131 - 145