GPU acceleration of an iterative scheme for gas-kinetic model equations with memory reduction techniques

被引：25

作者：

Zhu, Lianhua ^{[1
]}

Wang, Peng ^{[1
]}

Chen, Songze ^{[2
]}

Guo, Zhaoli ^{[2
]}

Zhang, Yonghao ^{[1
]}

机构：

[1] Univ Strathclyde, Dept Mech & Aerosp Engn, James Weir Fluids Lab, Glasgow G1 1XJ, Lanark, Scotland

[2] Huazhong Univ Sci & Technol, Sch Energy & Power, State Key Lab Coal Combust, Wuhan 430074, Hubei, Peoples R China

来源：

COMPUTER PHYSICS COMMUNICATIONS | 2019年 / 245卷

基金：

英国工程与自然科学研究理事会; 欧盟地平线“2020”; 美国国家科学基金会;

关键词：

GPU; CUDA; Discrete velocity method; Gas-kinetic equation; High performance computing; DISCRETE VELOCITY GRIDS; STEADY-STATE SOLUTIONS; BOLTZMANN-EQUATION; IMPLICIT SCHEME; POROUS-MEDIA; FLOW; SOLVERS; ALGORITHM; CONTINUUM; SIMULATIONS;

D O I：

10.1016/j.cpc.2019.106861

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This paper presents a Graphics Processing Unit (GPU) acceleration of an iteration-based discrete velocity method (DVM) for gas-kinetic model equations. Unlike the previous GPU parallelization of explicit kinetic schemes, this work is based on a fast converging iterative scheme. The memory reduction techniques previously proposed for DVM are applied for GPU computing, enabling full three-dimensional (3D) solutions of kinetic model equations in the contemporary CPUs usually with a limited memory capacity that otherwise would need terabytes of memory. The GPU algorithm is validated against the direct simulation Monte Carlo (DSMC) simulation of the 3D lid-driven cavity flow and the supersonic rarefied gas flow past a cube with the phase-space grid points up to 0.7 trillion. The computing performance profiling on three models of CPUs shows that the two main kernel functions can utilize 56% similar to 79% of the GPU computing and memory resources. The performance of the GPU algorithm is compared with a typical parallel CPU implementation of the same algorithm using the Message Passing Interface (MPI). The comparison shows that the GPU program on K40 and K80 achieves 1.2 similar to 2.8 and 1.2 similar to 2.4 speedups for the 3D lid-driven cavity flow, respectively, compared with the MPI parallelized CPU program running on 96 CPU cores. (C) 2019 Elsevier B.V. All rights reserved.

引用

页数：14

共 50 条

[1] The gas-kinetic scheme for shallow water equations
Xu K.
Journal of Hydrodynamics, 2006, 18 (Suppl 1) : 73 - 76
[2] The gas-kinetic scheme for shallow water equations *
Xu Kun
Proceedings of the Conference of Global Chinese Scholars on Hydrodynamics, 2006, : 73 - 76
[3] THE GAS-KINETIC SCHEME FOR SHALLOW WATER EQUATIONS
Xu Kun
JOURNAL OF HYDRODYNAMICS, 2006, 18 (03) : 73 - 76
[4] A Gas-kinetic scheme for the Euler equations with heat transfer
Xu, Kun
SIAM Journal of Scientific Computing, 1999, 20 (04): : 1317 - 1335
[5] A gas-kinetic scheme for the Euler equations with heat transfer
Xu, K
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1999, 20 (04): : 1317 - 1335
[6] A high-order multidimensional gas-kinetic scheme for hydrodynamic equations
LUO Jun
XU Kun
Science China(Technological Sciences), 2013, (10) : 2370 - 2384
[7] A high-order multidimensional gas-kinetic scheme for hydrodynamic equations
Luo Jun
Xu Kun
SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2013, 56 (10) : 2370 - 2384
[8] A high-order multidimensional gas-kinetic scheme for hydrodynamic equations
LUO Jun
XU Kun
Science China(Technological Sciences), 2013, 56 (10) : 2370 - 2384
[9] A gas-kinetic scheme for shallow-water equations with source terms
Tang, HZ
Tang, T
Xu, K
ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND PHYSIK, 2004, 55 (03): : 365 - 382
[10] A high-order multidimensional gas-kinetic scheme for hydrodynamic equations
Jun Luo
Kun Xu
Science China Technological Sciences, 2013, 56 : 2370 - 2384

← 1 2 3 4 5 →