Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters

被引：29

作者：

Lai, Jianqi ^{[1
]}

Yu, Hang ^{[1
]}

Tian, Zhengyu ^{[1
]}

Li, Hua ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Aerosp Sci & Engn, Changsha 410073, Peoples R China

来源：

SCIENTIFIC PROGRAMMING | 2020年 / 2020卷

关键词：

DIRECT NUMERICAL-SIMULATION; FLOW SOLVER; MESHLESS METHOD; OPTIMIZATION; CPU/GPU; SEQUEL; SCHEME; GRIDS;

D O I：

10.1155/2020/8862123

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Graphics processing units (GPUs) have a strong floating-point capability and a high memory bandwidth in data parallelism and have been widely used in high-performance computing (HPC). Compute unified device architecture (CUDA) is used as a parallel computing platform and programming model for the GPU to reduce the complexity of programming. The programmable GPUs are becoming popular in computational fluid dynamics (CFD) applications. In this work, we propose a hybrid parallel algorithm of the message passing interface and CUDA for CFD applications on multi-GPU HPC clusters. The AUSM + UP upwind scheme and the three-step Runge-Kutta method are used for spatial discretization and time discretization, respectively. The turbulent solution is solved by theK-omega SST two-equation model. The CPU only manages the execution of the GPU and communication, and the GPU is responsible for data processing. Parallel execution and memory access optimizations are used to optimize the GPU-based CFD codes. We propose a nonblocking communication method to fully overlap GPU computing, CPU_CPU communication, and CPU_GPU data transfer by creating two CUDA streams. Furthermore, the one-dimensional domain decomposition method is used to balance the workload among GPUs. Finally, we evaluate the hybrid parallel algorithm with the compressible turbulent flow over a flat plate. The performance of a single GPU implementation and the scalability of multi-GPU clusters are discussed. Performance measurements show that multi-GPU parallelization can achieve a speedup of more than 36 times with respect to CPU-based parallel computing, and the parallel algorithm has good scalability.

引用

页数：15

共 52 条

[31] A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence [J].

Mininni, Pablo D. ;

Rosenberg, Duane ;

Reddy, Raghu ;

Pouquet, Annick .

PARALLEL COMPUTING, 2011, 37 (6-7) :316-326

[32] Designing a benchmark for the performance evaluation of agent-based simulation applications on HPC [J].

Moreno, Andreu ;

Rodriguez, Juan J. ;

Beltran, Daniel ;

Sikora, Anna ;

Jorba, Josep ;

Cesar, Eduardo .

JOURNAL OF SUPERCOMPUTING, 2019, 75 (03) :1524-1550

[33] Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey [J].

Nguyen, Giang ;

Dlugolinsky, Stefan ;

Bobak, Martin ;

Viet Tran ;

Lopez Garcia, Alvaro ;

Heredia, Ignacio ;

Malik, Peter ;

Hluchy, Ladislav .

ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (01) :77-124

[34] Performance Optimization of 3D Lattice Boltzmann Flow Solver on a GPU [J].

Nhat-Phuong Tran ;

Lee, Myungho ;

Hong, Sugwon .

SCIENTIFIC PROGRAMMING, 2017, 2017

[35] Recent progress and challenges in exploiting graphics processors in computational fluid dynamics [J].

Niemeyer, Kyle E. ;

Sung, Chih-Jen .

JOURNAL OF SUPERCOMPUTING, 2014, 67 (02) :528-564

[36]

NVIDIA Corporation, 2019, NVIDIA CUDA C PROGRA

[37] MPI-CUDA parallelization of a finite-strip program for geometric nonlinear analysis: A hybrid approach [J].

Rakic, P. S. ;

Milasinovic, D. D. ;

Zivanov, Z. ;

Suvajdzin, Z. ;

Nikolic, M. ;

Hajdukovic, M. .

ADVANCES IN ENGINEERING SOFTWARE, 2011, 42 (05) :273-285

[38] A GPU-accelerated solver for turbulent flow and scalar transport based on the Lattice Boltzmann method [J].

Ren, Feng ;

Song, Baowei ;

Zhang, Ya ;

Hu, Haibao .

COMPUTERS & FLUIDS, 2018, 173 :29-36

[39] Many-integrated core (MIC) technology for accelerating Monte Carlo simulation of radiation transport: A study based on the code DPM [J].

Rodriguez, M. ;

Brualla, L. .

COMPUTER PHYSICS COMMUNICATIONS, 2018, 225 :28-35

[40] GPU accelerated flow solver for direct numerical simulation of turbulent flows [J].

Salvadore, Francesco ;

Bernardini, Matteo ;

Botti, Michela .

JOURNAL OF COMPUTATIONAL PHYSICS, 2013, 235 :129-142

← 1 2 3 4 5 6 →