A parallel MPI plus OpenMP plus OpenCL algorithm for hybrid supercomputations of incompressible flows

被引：22

作者：

Gorobets, A. V. ^{[1
,2
]}

Trias, F. X. ^{[1
]}

Oliva, A. ^{[1
]}

机构：

[1] Tech Univ Catalonia, ETSEIAT, Heat & Mass Transfer Technol Ctr, Terrassa 08222, Spain

[2] Keldysh Inst Appl Math, Moscow 125047, Russia

来源：

COMPUTERS & FLUIDS | 2013年 / 88卷

关键词：

MPI; OpenMP; OpenCL; GPU; Parallel CFD; Turbulence; SCHUR-FOURIER DECOMPOSITION; GPU; COMPUTERS; SOLVER;

D O I：

10.1016/j.compfluid.2013.05.021

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The work is devoted to the development of efficient parallel algorithms for large-scale simulations of incompressible flows on hybrid supercomputers based on massively-parallel accelerators. The governing equations are discretized using a high-order finite-volume scheme for Cartesian staggered meshes with the only restriction that, at least, one direction is periodic. Its "classical" MPI + OpenMP parallel implementation for CPUs was designed to scale till 100,000 CPU cores. The new hybrid algorithm is developed on a base of a multi-level parallel model that exploits several layers of parallelism of a modern hybrid supercomputer. In this model, MPI and OpenMP are used on the first two levels to couple nodes of a supercomputer and to engage its CPU cores. Then, computing accelerators are further used by means of the hardware independent OpenCL computing standard. In this way, the implementation is adapted to a general computing model with central processors and math co-processors. In this paper the work is focused on adapting the basic operations of the algorithm to architectures of Graphics Processing Units (GPU) without considering the multi-CPU communication scheme. Technology of porting the code to OpenCL is described, certain optimization approaches are presented and relevant performance results obtaining up to 80-90 GFLOPS on a GPU accelerator are demonstrated. Moreover, the experience with different CPU architectures is summarized and a comparison based on the particular application is given for AMD and NVIDIA GPUs as well as for CUDA and OpenCL frameworks. (C) 2013 Elsevier Ltd. All rights reserved.

引用

页码：764 / 772

页数：9

共 50 条

[1] A Hybrid MPI plus OpenMP Application for Processing Big Trajectory Data
Stojanovic, Natalija
Stojanovic, Dragan
STUDIES IN INFORMATICS AND CONTROL, 2015, 24 (02): : 229 - 236
[2] Hybrid MPI plus OpenMP Parallelization of Scramjet Simulation with Hypergraph Partitioning
Zeng Yao-yuan
Zhao Wen-tao
Wang Zheng-hua
ADVANCES IN MANUFACTURING SCIENCE AND ENGINEERING, PTS 1-4, 2013, 712-715 : 1294 - +
[3] Hybrid MPI plus OpenMP Implementation of eXtended Discrete Element Method
Checkaraou, Abdoul Wahid Mainassara
Rousset, Alban
Besseron, Xavier
Varrette, Sebastien
Peters, Bernhard
2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 450 - 457
[4] OpenMP plus MPI Parallel Implementation of a Numerical Method for Solving a Kinetic Equation
Titarev, V. A.
Utyuzhnikov, S. V.
Chikitkin, A. V.
COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 2016, 56 (11) : 1919 - 1928
[5] Automatic Hybrid MPI plus OpenMP Code Generation with 11c
Reyes, Ruyman
Dorta, Antonio J.
Almeida, Francisco
de Sande, Francisco
RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, PROCEEDINGS, 2009, 5759 : 185 - 195
[6] Automatic Hybrid OpenMP plus MPI Program Generation for Dynamic Programming Problems
VandenBerg, Denny R.
Stout, Quentin F.
2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, : 178 - 186
[7] A Parallel Approach for Evolutionary Induced Decision Trees. MPI plus OpenMP Implementation
Czajkowski, Marcin
Jurczuk, Krzysztof
Kretowski, Marek
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT I, 2015, 9119 : 340 - 349
[8] MPI Thread-Level Checking for MPI plus OpenMP Applications
Saillard, Emmanuelle
Carribault, Patrick
Barthou, Denis
EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 31 - 42
[9] Analyses on Performance of GROMACS in Hybird MPI plus OpenMP plus CUDA Cluster
Li, Ce
Chen, Wenbo
Zhang, Yang
Bai, Qifeng
2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 904 - 911
[10] Dynamic load balancing of MPI plus OpenMP applications
Corbalán, J
Duran, A
Labarta, J
2004 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2004, : 195 - 202

← 1 2 3 4 5 →