Multidisciplinary simulation acceleration using multiple shared memory graphical processing units

被引：0

作者：

Kemal, Jonathan Y. ^{[1
]}

Davis, Roger L. ^{[1
]}

Owens, John D. ^{[2
]}

机构：

[1] Univ Calif Davis, Dept Mech & Aerosp Engn, 1 Shields Ave, Davis, CA 95616 USA

[2] Univ Calif Davis, Dept Elect & Comp Engn, Engn & Entrepreneurship, Davis, CA 95616 USA

来源：

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS | 2016年 / 30卷 / 04期

关键词：

Computational fluid dynamics; CFD; GPU; CUDA; parallel computing; GPU; SOLVERS;

D O I：

10.1177/1094342016639114

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we describe the strategies and programming techniques used in porting a multidisciplinary fluid/thermal interaction procedure to graphical processing units (GPUs). We discuss the strategies for selecting which disciplines or routines are chosen for use on GPUs rather than CPUs. In addition, we describe the programming techniques including use of Compute Unified Device Architecture (CUDA), mixed-language (Fortran/C/CUDA) usage, Fortran/C memory mapping of arrays, and GPU optimization. We solve all equations using the multi-block, structured grid, finite volume numerical technique, with the dual time-step scheme used for unsteady simulations. Our numerical solver code targets CUDA-capable GPUs produced by NVIDIA. We use NVIDIA Tesla C2050/C2070 GPUs based on the Fermi architecture and compare our resulting performance against Intel Xeon X5690 CPUs. Individual solver routines converted to CUDA typically run about 10 times faster on a GPU for sufficiently dense computational grids. We used a conjugate cylinder computational grid and ran a turbulent steady flow simulation using four increasingly dense computational grids. Our densest computational grid is divided into 13 blocks each containing 1033x1033 grid points, for a total of 13.87 million grid points or 1.07 million grid points per domain block. Comparing the performance of eight GPUs to that of eight CPUs, we obtain an overall speedup of about 6.0 when using our densest computational grid. This amounts to an 8-GPU simulation running about 39.5 times faster than running than a single-CPU simulation.

引用

页码：486 / 508

页数：23

共 21 条

[1]

[Anonymous], CFDLTR8710 MIT DEP A

[2] Sparse matrix solvers on the GPU:: Conjugate gradients and multigrid [J].

Bolz, J ;

Farmer, I ;

Grinspun, E ;

Schröder, P .

ACM TRANSACTIONS ON GRAPHICS, 2003, 22 (03) :917-924

[3]

Brandvik T, 2008, P 48 AIAA AER SCI M

[4] CASCADE VISCOUS-FLOW ANALYSIS USING THE NAVIER-STOKES EQUATIONS [J].

DAVIS, RL ;

NI, RH ;

CARTER, JE .

JOURNAL OF PROPULSION AND POWER, 1987, 3 (05) :406-414

[5] Detached-Eddy Simulation Procedure Targeted for Design [J].

Davis, Roger L. ;

Dannenhoffer, John F., III .

JOURNAL OF PROPULSION AND POWER, 2008, 24 (06) :1287-1294

[6]

Fan Z, 2004, SC 04

[7]

Fife M, 2009, AIAA 47 AER SCI M OR

[8] Exploring weak scalability for FEM calculations on a GPU-enhanced cluster [J].

Goeddeke, Dominik ;

Strzodka, Robert ;

Mohd-Yusof, Jamaludin ;

McCormick, Patrick ;

Buijssen, Sven H. M. ;

Grajewski, Matthias ;

Turek, Stefan .

PARALLEL COMPUTING, 2007, 33 (10-11) :685-699

[9]

Goodnight N., 2003, P ACM SIGGRAPHEUROGR, P102

[10] Visual simulation of shallow-water waves [J].

Hagen, TR ;

Hjelmervik, JM ;

Lie, KA ;

Natvig, JR ;

Henriksen, MO .

SIMULATION MODELLING PRACTICE AND THEORY, 2005, 13 (08) :716-726

← 1 2 3 →