A parallel MPI plus OpenMP plus OpenCL algorithm for hybrid supercomputations of incompressible flows

被引：22

作者：

Gorobets, A. V. ^{[1
,2
]}

Trias, F. X. ^{[1
]}

Oliva, A. ^{[1
]}

机构：

[1] Tech Univ Catalonia, ETSEIAT, Heat & Mass Transfer Technol Ctr, Terrassa 08222, Spain

[2] Keldysh Inst Appl Math, Moscow 125047, Russia

来源：

COMPUTERS & FLUIDS | 2013年 / 88卷

关键词：

MPI; OpenMP; OpenCL; GPU; Parallel CFD; Turbulence; SCHUR-FOURIER DECOMPOSITION; GPU; COMPUTERS; SOLVER;

D O I：

10.1016/j.compfluid.2013.05.021

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The work is devoted to the development of efficient parallel algorithms for large-scale simulations of incompressible flows on hybrid supercomputers based on massively-parallel accelerators. The governing equations are discretized using a high-order finite-volume scheme for Cartesian staggered meshes with the only restriction that, at least, one direction is periodic. Its "classical" MPI + OpenMP parallel implementation for CPUs was designed to scale till 100,000 CPU cores. The new hybrid algorithm is developed on a base of a multi-level parallel model that exploits several layers of parallelism of a modern hybrid supercomputer. In this model, MPI and OpenMP are used on the first two levels to couple nodes of a supercomputer and to engage its CPU cores. Then, computing accelerators are further used by means of the hardware independent OpenCL computing standard. In this way, the implementation is adapted to a general computing model with central processors and math co-processors. In this paper the work is focused on adapting the basic operations of the algorithm to architectures of Graphics Processing Units (GPU) without considering the multi-CPU communication scheme. Technology of porting the code to OpenCL is described, certain optimization approaches are presented and relevant performance results obtaining up to 80-90 GFLOPS on a GPU accelerator are demonstrated. Moreover, the experience with different CPU architectures is summarized and a comparison based on the particular application is given for AMD and NVIDIA GPUs as well as for CUDA and OpenCL frameworks. (C) 2013 Elsevier Ltd. All rights reserved.

引用

页码：764 / 772

页数：9

共 50 条

[31] A parallel Non-Local means denoising algorithm implementation with OpenMP and OpenCL on Intel Xeon Phi Coprocessor
Zhu, Huming
Wu, Yanfei
Li, Pei
Wang, Duo
Shi, Wei
Zhang, Peng
Jiao, Licheng
JOURNAL OF COMPUTATIONAL SCIENCE, 2016, 17 : 591 - 598
[32] Hybrid MPI plus OpenMP parallelization of an FFT-based 3D Poisson solver with one periodic direction
Gorobets, A.
Trias, F. X.
Borrell, R.
Lehmkuhl, O.
Oliva, A.
COMPUTERS & FLUIDS, 2011, 49 (01) : 101 - 109
[33] A Parallel Algorithm Based On OpenMP plus STM for FPGA Timing-Driven Placement
Zhang, Jia-qi
Lv, Hui-juan
Tan, Li-bo
Pan, Tao-tao
COMPUTER SCIENCE AND TECHNOLOGY (CST2016), 2017, : 1185 - 1193
[34] Fine-Grained MPI plus OpenMP Plasma Simulations: Communication Overlap with Dependent Tasks
Richard, Jerome
Latu, Guillaume
Bigot, Julien
Gautier, Thierry
EURO-PAR 2019: PARALLEL PROCESSING, 2019, 11725 : 419 - 433
[35] A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence
Mininni, Pablo D.
Rosenberg, Duane
Reddy, Raghu
Pouquet, Annick
PARALLEL COMPUTING, 2011, 37 (6-7) : 316 - 326
[36] Predicting Software Defects in Hybrid MPI and OpenMP Parallel Programs Using Machine Learning
Althiban, Amani S.
Alharbi, Hajar M.
Al Khuzayem, Lama A.
Eassa, Fathy Elbouraey
ELECTRONICS, 2024, 13 (01)
[37] A Framework for an Automatic Hybrid MPI plus OpenlVIP code generation
Hamidouche, Khaled
Falcou, Joel
Etiemble, Daniel
HIGH PERFORMANCE COMPUTING SYMPOSIUM 2011 (HPC 2011) - 2011 SPRING SIMULATION MULTICONFERENCE - BK 6 OF 8, 2011, 43 (02): : 48 - 55
[38] MPI plus OpenCL implementation of a phase-field method incorporating CALPHAD description of Gibbs energies on heterogeneous computing platforms
Tennyson, P. Gerald
Karthik, G. M.
Phanikumar, G.
COMPUTER PHYSICS COMMUNICATIONS, 2015, 186 : 48 - 64
[39] Investigating Dependency Graph Discovery Impact on Task-based MPI plus OpenMP Applications Performances
Pereira, Romain
Roussel, Adrien
Carribault, Patrick
Gautier, Thierry
PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023, 2023, : 163 - 172
[40] Enabling Performance Efficient Runtime Support for Hybrid MPI plus UPC plus plus Programming Models
Hashmi, Jahanzeb Maqbool
Hamidouche, Khaled
Panda, Dhabaleswar K.
PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 1180 - 1187

← 1 2 3 4 5 →