Parallelization of 2D MPDATA EULAG algorithm on hybrid architectures with GPU accelerators

被引：22

作者：

Wyrzykowski, Roman ^{[1
]}

Szustak, Lukasz ^{[1
]}

Rojek, Krzysztof ^{[1
]}

机构：

[1] Czestochowa Tech Univ, Inst Comp & Informat Sci, Czestochowa, Poland

来源：

PARALLEL COMPUTING | 2014年 / 40卷 / 08期

关键词：

MPDATA advection algorithm; Stencil computation; GPU accelerators; Hybrid CPU-GPU architectures; Hierarchical decomposition; Autotuning; ADVECTION TRANSPORT ALGORITHM; PERFORMANCE; MULTI; IMPLEMENTATION; SIMULATION;

D O I：

10.1016/j.parco.2014.04.009

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model developed for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The dynamic core of EULAG includes the multidimensional positive definite advection transport algorithm (MPDATA) and elliptic solver. In this work we investigate aspects of an optimal parallel version of the 2D MPDATA algorithm on modern hybrid architectures with GPU accelerators, where computations are distributed across both GPU and CPU components. Using the hybrid OpenMP-OpenCL model of parallel programming opens the way to harness the power of CPU-GPU platforms in a portable way. In order to better utilize features of such computing platforms, comprehensive adaptations of MPDATA computations to hybrid architectures are proposed. These adaptations are based on efficient strategies for memory and computing resource management, which allow us to ease memory and communication bounds, and better exploit the theoretical floating point efficiency of CPU-GPU platforms. The main contributions of the paper are: method for the decomposition of the 2D MPDATA algorithm as a tool to adapt MPDATA computations to hybrid architectures with GPU accelerators by minimizing communication and synchronization between CPU and GPU components at the cost of additional computations; method for the adaptation of 2D MPDATA computations to multicore CPU platforms, based on space and temporal blocking techniques; method for the adaptation of the 2D MPDATA algorithm to GPU architectures, based on a hierarchical decomposition strategy across data and computation domains, with support provided by the developed GPU task scheduler allowing for the flexible management of available resources; approach to the parametric optimization of 2D MPDATA computations on GPUs using the autotuning technique, which allows us to provide a portable implementation methodology across a variety of GPUs. Hybrid platforms tested in this study contain different numbers of CPUs and GPUs from solutions consisting of a single CPU and a single GPU to the most elaborate configuration containing two CPUs and two GPUs. Processors of different vendors are employed in these systems - both Intel and AMD CPUs, as well as GPUs from NVIDIA and AMD. For all the grid sizes and for all the tested platforms, the hybrid version with computations spread across CPU and GPU components allows us to achieve the highest performance. In particular, for the largest MPDATA grids used in our experiments, the speedups of the hybrid versions over GPU and CPU versions vary from 1.30 to 1.69, and from 1.95 to 2.25, respectively. (C) 2014 Elsevier B.V. All rights reserved.

引用

页码：425 / 447

页数：23

共 41 条

[1]

[Anonymous], 2010, 2010 IEEE INT S PAR

[2]

[Anonymous], 2008, SC 08

[3]

Augonnet Cedric, 2010, Proceedings 2010 IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS 2010), P291, DOI 10.1109/ICPADS.2010.129

[4] COMPILER TRANSFORMATIONS FOR HIGH-PERFORMANCE COMPUTING [J].