Optimization and acceleration of flow simulations for CFD on CPU/GPU architecture

被引：13

作者：

Lei, Jiang ^{[1
]}

Li, Da-li ^{[1
]}

Zhou, Yun-long ^{[1
]}

Liu, Wei ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Aerosp Sci & Engn, Changsha 410073, Hunan, Peoples R China

来源：

JOURNAL OF THE BRAZILIAN SOCIETY OF MECHANICAL SCIENCES AND ENGINEERING | 2019年 / 41卷 / 07期

关键词：

Euler equation; GPU; CUDA; CFD; DIRECT NUMERICAL-SIMULATION; INCOMPRESSIBLE FLOWS; GPU; SOLVER;

D O I：

10.1007/s40430-019-1793-9

中图分类号：

TH [机械、仪表工业];

学科分类号：

0802 ;

摘要：

With the increasing requirement of high computational power in computational fluid dynamics (CFD) field, the graphic processing units (GPUs) with great floating-point computing capability play more important roles. This work explores the porting of an Euler solver from central processing units (CPUs) to three different CPU/GPU heterogeneous hardware platforms using MUSCL and NND schemes, and then the computational acceleration of one-dimensional (1D) Riemann problem and two-dimensional (2D) flow past a forward-facing step is investigated. Based on hardware structures, memory models and programming methods, the working manner of heterogeneous systems was firstly introduced in this paper. Subsequently, three different heterogeneous methods employed in the current study were presented in detail, while porting all parts of the solver loop to GPU possessed the best performance among them. Several optimization strategies suitable for the solver were adopted to achieve substantial execution speedups, while using shared memory on GPU was relatively rarely reported in CFD literature. Finally, the simulation of 1D Riemann verified the reliability of the modified codes on GPU, demonstrating strong ability in capturing discontinuities of both schemes. The two cases with their 1D computational domains discretized into 10,000 cells both realized a speedup exceeding 25, compared to that executed on a single-core CPU. In simulation of the 2D step flow, we came to the highest speedups of 260 for MUSCL scheme with 800x400 mesh size and 144 for NND scheme with 400x200 computational domain, respectively.

引用

页数：15

共 39 条

[31] TOWARDS THE ULTIMATE CONSERVATIVE DIFFERENCE SCHEME .5. 2ND-ORDER SEQUEL TO GODUNOVS METHOD [J].

VAN LEER, B .

JOURNAL OF COMPUTATIONAL PHYSICS, 1979, 32 (01) :101-136

[32] On the utility of GPU accelerated high-order methods for unsteady flow simulations: A comparison with industry-standard tools [J].

Vermeire, B. C. ;

Witherden, F. D. ;

Vincent, P. E. .

JOURNAL OF COMPUTATIONAL PHYSICS, 2017, 334 :497-521

[33] OpenACC acceleration of an unstructured CFD solver based on a reconstructed discontinuous Galerkin method for compressible flows [J].

Xia, Yidong ;

Lou, Jialin ;

Luo, Hong ;

Edwards, Jack ;

Mueller, Frank .

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, 2015, 78 (03) :123-139

[34]

Xu C, 2014, P IEEE 28 INT PAR DI

[35] Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer [J].

Xu, Chuanfu ;

Deng, Xiaogang ;

Zhang, Lilun ;

Fang, Jianbin ;

Wang, Guangxue ;

Jiang, Yi ;

Cao, Wei ;

Che, Yonggang ;

Wang, Yongxian ;

Wang, Zhenghua ;

Liu, Wei ;

Cheng, Xinghua .

JOURNAL OF COMPUTATIONAL PHYSICS, 2014, 278 :275-297

[36]

Zhang H. X., 1988, Acta Aerodynamica Sinica, V6, P143

[37] A GPU-accelerated implicit meshless method for compressible flows [J].

Zhang, Jia-Le ;

Ma, Zhi-Hua ;

Chen, Hong-Quan ;

Cao, Cheng .

JOURNAL OF COMPUTATIONAL PHYSICS, 2018, 360 :39-56

[38]

[Zhang Shuhai 张树海], 2016, [空气动力学学报, Acta Aerodynamica Sinica], V34, P157

[39] AFiD-GPU: A versatile Navier-Stokes solver for wall-bounded turbulent flows on GPU clusters [J].

Zhu, Xiaojue ;

Phillips, Everett ;

Spandan, Vamsi ;

Donners, John ;

Ruetsch, Gregory ;

Romero, Joshua ;

Ostilla-Monico, Rodolfo ;

Yang, Yantao ;

Lohse, Detlef ;

Verzicco, Roberto ;

Fatica, Massimiliano ;

Stevens, Richard J. A. M. .

COMPUTER PHYSICS COMMUNICATIONS, 2018, 229 :199-210

← 1 2 3 4 →