GPU-ACCELERATED IMPLICIT TURBOMACHINERY FLOW SOLVER USING MULTIPLE MPI COMMUNICATORS

被引:0
作者
Wang, Feng [1 ]
di Mare, Luca [1 ]
机构
[1] Univ Oxford, Dept Engn Sci, Oxford Thermofluids Inst, Oxford, England
来源
PROCEEDINGS OF ASME TURBO EXPO 2023: TURBOMACHINERY TECHNICAL CONFERENCE AND EXPOSITION, GT2023, VOL 13C | 2023年
关键词
OPTIMIZATIONS;
D O I
暂无
中图分类号
V [航空、航天];
学科分类号
08 ; 0825 ;
摘要
This paper describes a procedure to port an unstructured CFD solver to GPU for turbomachinery simulations via NVIDIA OpenACC. Matrix-free implicit time stepping is used for robust and efficient time integration. Data movement between the device and the host is managed explicitly and is minimized for optimal performance. For multiple GPU computations, a flexible multistage decomposition is devised. The computational domain is first divided into different components (or sub-domains), each of which represents an physical entity, such as a bladerow. For each subdomain, a graph partition is then performed. Data exchange among multiple GPUs are devised to accommodate such domain decomposition strategy. For halo exchange within each sub-domain, data can be sent directly from one GPU to the other one if permitted by the hardware. To exchange data among different sub-domains, a general purpose coupler is devised to exchange data among the GPUs via the host. In terms of computational speed, for a socket-to-socket comparison (one V100 and 20 CPU cores), the GPU code shows a global speedup of 8.1 for double precision and 10.1 for single precision. Furthermore, the GPU code is 5.5 and 6.9 times more energy efficient for double and single precisions, respectively. The GPU code is validated on Rotor 37 and a high speed multistage compressor. Good agreement with rig data is observed for both cases. The multistage compressor case has roughly 17 million cells and it takes the GPU code 8 hours to produces 4 revolutions using 4 V100. The developed GPU code represents a flexible computational framework for future multi-fidelity/physics simulations on heterogeneous computing environment.
引用
收藏
页数:12
相关论文
共 35 条
[1]  
[Anonymous], About Us
[2]  
[Anonymous], 2010, Programming Massively Parallel Processors: A Hands-on Approach
[3]  
[Anonymous], About us
[4]  
[Anonymous], 1999, P 1999 ACM IEEE C SU
[5]  
[Anonymous], About us
[6]  
Blazek J., 2015, Computational fluid dynamics: Principles and applications, V3rd
[7]   An Accelerated 3D Navier-Stokes Solver for Flows in Turbomachines [J].
Brandvik, Tobias ;
Pullan, Graham .
JOURNAL OF TURBOMACHINERY-TRANSACTIONS OF THE ASME, 2011, 133 (02)
[8]   Lip Stall Suppression in Powered Intakes [J].
Carnevale, M. ;
Wang, F. ;
Green, J. S. ;
Di Mare, L. .
JOURNAL OF PROPULSION AND POWER, 2016, 32 (01) :161-170
[9]  
Castonguay P., 2012, Diss
[10]   Execution of a parallel edge-based Navier-Stokes solver on commodity graphics processor units [J].
Corral, Roque ;
Gisbert, Fernando ;
Pueblas, Jesus .
INTERNATIONAL JOURNAL OF COMPUTATIONAL FLUID DYNAMICS, 2017, 31 (02) :93-108