CPU/GPU COMPUTING FOR AN IMPLICIT MULTI-BLOCK COMPRESSIBLE NAVIER-STOKES SOLVER ON HETEROGENEOUS PLATFORM

被引：4

作者：

Deng, Liang ^{[1
]}

Bai, Hanli ^{[1
]}

Wang, Fang ^{[2
]}

Xu, Qingxin ^{[1
]}

机构：

[1] China Aerodynam Res Dev Ctr, Computat Aerodynam Inst, Mianyang, Sichuan, Peoples R China

[2] Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China

来源：

PROCEEDINGS OF THE SIXTH INTERNATIONAL SYMPOSIUM ON PHYSICS OF FLUIDS (ISPF6) | 2016年 / 42卷

关键词：

Multi-block; structured grid; alternating direction implicit; CFD solver; MPI-OpenMP-CUDA; CPU/GPU computing;

D O I：

10.1142/S2010194516601630

中图分类号：

O35 [流体力学]; O53 [等离子体物理学];

学科分类号：

070204 ; 080103 ; 080704 ;

摘要：

CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely "one-thread-one-point" and "one-thread-one-line", to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.

引用

页数：14

共 20 条

[1]

[Anonymous], 2006, Riemann Solvers and Numerical Methods for Fluid Dynamics: A Practical Introduction

[2]

[Anonymous], 2009, NVIDIA GPU TECHN C

[3] CPU/GPU computing for a multi-block structured grid based high-order flow solver on a large heterogeneous system [J].

Cao, Wei ;

Xu, Chuan-fu ;

Wang, Zheng-hua ;

Yao, Lu ;

Liu, Hua-yong .

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2014, 17 (02) :255-270

[4]

Corrigan A., 2009, AIAA paper, V4001, P22

[5]

Fengshun Lu, 2011, Proceedings of the 2011 6th International Conference on Computer Sciences and Convergence Information Technology (ICCIT 2011), P534

[6] A multi-GPU accelerated solver for the three-dimensional two-phase incompressible Navier-Stokes equations [J].

Griebel, Michael ;

Zaspel, Peter .

COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2010, 25 (1-2) :65-73

[7]

Harris Mark., 2007, NVIDIA DEVELOPER TEC, V2

[8]

Jacobsen D A, 2011, 49 AIAA AER SCI M IN, V4

[9]

Jacobsen Dana, 2010, 48 AIAA AER SCI M EX, V16

[10]

Jespersen DC, 2010, SCI PROGRAMMING-NETH, V18, P193, DOI [10.3233/SPR-2010-0309, 10.1155/2010/564806]

← 1 2 →