A parallel MPI plus OpenMP plus OpenCL algorithm for hybrid supercomputations of incompressible flows

被引:22
|
作者
Gorobets, A. V. [1 ,2 ]
Trias, F. X. [1 ]
Oliva, A. [1 ]
机构
[1] Tech Univ Catalonia, ETSEIAT, Heat & Mass Transfer Technol Ctr, Terrassa 08222, Spain
[2] Keldysh Inst Appl Math, Moscow 125047, Russia
关键词
MPI; OpenMP; OpenCL; GPU; Parallel CFD; Turbulence; SCHUR-FOURIER DECOMPOSITION; GPU; COMPUTERS; SOLVER;
D O I
10.1016/j.compfluid.2013.05.021
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The work is devoted to the development of efficient parallel algorithms for large-scale simulations of incompressible flows on hybrid supercomputers based on massively-parallel accelerators. The governing equations are discretized using a high-order finite-volume scheme for Cartesian staggered meshes with the only restriction that, at least, one direction is periodic. Its "classical" MPI + OpenMP parallel implementation for CPUs was designed to scale till 100,000 CPU cores. The new hybrid algorithm is developed on a base of a multi-level parallel model that exploits several layers of parallelism of a modern hybrid supercomputer. In this model, MPI and OpenMP are used on the first two levels to couple nodes of a supercomputer and to engage its CPU cores. Then, computing accelerators are further used by means of the hardware independent OpenCL computing standard. In this way, the implementation is adapted to a general computing model with central processors and math co-processors. In this paper the work is focused on adapting the basic operations of the algorithm to architectures of Graphics Processing Units (GPU) without considering the multi-CPU communication scheme. Technology of porting the code to OpenCL is described, certain optimization approaches are presented and relevant performance results obtaining up to 80-90 GFLOPS on a GPU accelerator are demonstrated. Moreover, the experience with different CPU architectures is summarized and a comparison based on the particular application is given for AMD and NVIDIA GPUs as well as for CUDA and OpenCL frameworks. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:764 / 772
页数:9
相关论文
共 50 条
  • [31] A parallel Non-Local means denoising algorithm implementation with OpenMP and OpenCL on Intel Xeon Phi Coprocessor
    Zhu, Huming
    Wu, Yanfei
    Li, Pei
    Wang, Duo
    Shi, Wei
    Zhang, Peng
    Jiao, Licheng
    JOURNAL OF COMPUTATIONAL SCIENCE, 2016, 17 : 591 - 598
  • [32] Hybrid MPI plus OpenMP parallelization of an FFT-based 3D Poisson solver with one periodic direction
    Gorobets, A.
    Trias, F. X.
    Borrell, R.
    Lehmkuhl, O.
    Oliva, A.
    COMPUTERS & FLUIDS, 2011, 49 (01) : 101 - 109
  • [33] A Parallel Algorithm Based On OpenMP plus STM for FPGA Timing-Driven Placement
    Zhang, Jia-qi
    Lv, Hui-juan
    Tan, Li-bo
    Pan, Tao-tao
    COMPUTER SCIENCE AND TECHNOLOGY (CST2016), 2017, : 1185 - 1193
  • [34] Fine-Grained MPI plus OpenMP Plasma Simulations: Communication Overlap with Dependent Tasks
    Richard, Jerome
    Latu, Guillaume
    Bigot, Julien
    Gautier, Thierry
    EURO-PAR 2019: PARALLEL PROCESSING, 2019, 11725 : 419 - 433
  • [35] A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence
    Mininni, Pablo D.
    Rosenberg, Duane
    Reddy, Raghu
    Pouquet, Annick
    PARALLEL COMPUTING, 2011, 37 (6-7) : 316 - 326
  • [36] Predicting Software Defects in Hybrid MPI and OpenMP Parallel Programs Using Machine Learning
    Althiban, Amani S.
    Alharbi, Hajar M.
    Al Khuzayem, Lama A.
    Eassa, Fathy Elbouraey
    ELECTRONICS, 2024, 13 (01)
  • [37] A Framework for an Automatic Hybrid MPI plus OpenlVIP code generation
    Hamidouche, Khaled
    Falcou, Joel
    Etiemble, Daniel
    HIGH PERFORMANCE COMPUTING SYMPOSIUM 2011 (HPC 2011) - 2011 SPRING SIMULATION MULTICONFERENCE - BK 6 OF 8, 2011, 43 (02): : 48 - 55
  • [38] MPI plus OpenCL implementation of a phase-field method incorporating CALPHAD description of Gibbs energies on heterogeneous computing platforms
    Tennyson, P. Gerald
    Karthik, G. M.
    Phanikumar, G.
    COMPUTER PHYSICS COMMUNICATIONS, 2015, 186 : 48 - 64
  • [39] Investigating Dependency Graph Discovery Impact on Task-based MPI plus OpenMP Applications Performances
    Pereira, Romain
    Roussel, Adrien
    Carribault, Patrick
    Gautier, Thierry
    PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023, 2023, : 163 - 172
  • [40] Enabling Performance Efficient Runtime Support for Hybrid MPI plus UPC plus plus Programming Models
    Hashmi, Jahanzeb Maqbool
    Hamidouche, Khaled
    Panda, Dhabaleswar K.
    PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 1180 - 1187