A parallel MPI plus OpenMP plus OpenCL algorithm for hybrid supercomputations of incompressible flows

被引:22
|
作者
Gorobets, A. V. [1 ,2 ]
Trias, F. X. [1 ]
Oliva, A. [1 ]
机构
[1] Tech Univ Catalonia, ETSEIAT, Heat & Mass Transfer Technol Ctr, Terrassa 08222, Spain
[2] Keldysh Inst Appl Math, Moscow 125047, Russia
关键词
MPI; OpenMP; OpenCL; GPU; Parallel CFD; Turbulence; SCHUR-FOURIER DECOMPOSITION; GPU; COMPUTERS; SOLVER;
D O I
10.1016/j.compfluid.2013.05.021
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The work is devoted to the development of efficient parallel algorithms for large-scale simulations of incompressible flows on hybrid supercomputers based on massively-parallel accelerators. The governing equations are discretized using a high-order finite-volume scheme for Cartesian staggered meshes with the only restriction that, at least, one direction is periodic. Its "classical" MPI + OpenMP parallel implementation for CPUs was designed to scale till 100,000 CPU cores. The new hybrid algorithm is developed on a base of a multi-level parallel model that exploits several layers of parallelism of a modern hybrid supercomputer. In this model, MPI and OpenMP are used on the first two levels to couple nodes of a supercomputer and to engage its CPU cores. Then, computing accelerators are further used by means of the hardware independent OpenCL computing standard. In this way, the implementation is adapted to a general computing model with central processors and math co-processors. In this paper the work is focused on adapting the basic operations of the algorithm to architectures of Graphics Processing Units (GPU) without considering the multi-CPU communication scheme. Technology of porting the code to OpenCL is described, certain optimization approaches are presented and relevant performance results obtaining up to 80-90 GFLOPS on a GPU accelerator are demonstrated. Moreover, the experience with different CPU architectures is summarized and a comparison based on the particular application is given for AMD and NVIDIA GPUs as well as for CUDA and OpenCL frameworks. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:764 / 772
页数:9
相关论文
共 50 条
  • [1] A Hybrid MPI plus OpenMP Application for Processing Big Trajectory Data
    Stojanovic, Natalija
    Stojanovic, Dragan
    STUDIES IN INFORMATICS AND CONTROL, 2015, 24 (02): : 229 - 236
  • [2] Hybrid MPI plus OpenMP Parallelization of Scramjet Simulation with Hypergraph Partitioning
    Zeng Yao-yuan
    Zhao Wen-tao
    Wang Zheng-hua
    ADVANCES IN MANUFACTURING SCIENCE AND ENGINEERING, PTS 1-4, 2013, 712-715 : 1294 - +
  • [3] Hybrid MPI plus OpenMP Implementation of eXtended Discrete Element Method
    Checkaraou, Abdoul Wahid Mainassara
    Rousset, Alban
    Besseron, Xavier
    Varrette, Sebastien
    Peters, Bernhard
    2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 450 - 457
  • [4] OpenMP plus MPI Parallel Implementation of a Numerical Method for Solving a Kinetic Equation
    Titarev, V. A.
    Utyuzhnikov, S. V.
    Chikitkin, A. V.
    COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 2016, 56 (11) : 1919 - 1928
  • [5] Automatic Hybrid MPI plus OpenMP Code Generation with 11c
    Reyes, Ruyman
    Dorta, Antonio J.
    Almeida, Francisco
    de Sande, Francisco
    RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, PROCEEDINGS, 2009, 5759 : 185 - 195
  • [6] Automatic Hybrid OpenMP plus MPI Program Generation for Dynamic Programming Problems
    VandenBerg, Denny R.
    Stout, Quentin F.
    2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, : 178 - 186
  • [7] A Parallel Approach for Evolutionary Induced Decision Trees. MPI plus OpenMP Implementation
    Czajkowski, Marcin
    Jurczuk, Krzysztof
    Kretowski, Marek
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT I, 2015, 9119 : 340 - 349
  • [8] MPI Thread-Level Checking for MPI plus OpenMP Applications
    Saillard, Emmanuelle
    Carribault, Patrick
    Barthou, Denis
    EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 31 - 42
  • [9] Analyses on Performance of GROMACS in Hybird MPI plus OpenMP plus CUDA Cluster
    Li, Ce
    Chen, Wenbo
    Zhang, Yang
    Bai, Qifeng
    2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 904 - 911
  • [10] Dynamic load balancing of MPI plus OpenMP applications
    Corbalán, J
    Duran, A
    Labarta, J
    2004 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2004, : 195 - 202