Performance engineering for the lattice Boltzmann method on GPGPUs: Architectural requirements and performance results

被引:43
作者
Habich, J. [1 ]
Feichtinger, C. [2 ]
Koestler, H. [2 ]
Hager, G. [1 ]
Wellein, G. [1 ,2 ]
机构
[1] Univ Erlangen Nurnberg, Erlangen Reg Comp Ctr, Erlangen, Germany
[2] Univ Erlangen Nurnberg, Dept Comp Sci, Erlangen, Germany
关键词
Parallelization; GPGPU; HPC; CUDA; OpenCL; Computational fluid dynamics; Lattice Boltzmann method; Performance modeling and engineering;
D O I
10.1016/j.compfluid.2012.02.013
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
GPUs offer several times the floating point performance and memory bandwidth of current standard two socket CPU compute nodes, e.g. NVIDIA C2070 vs. Intel Xeon Westmere X5650. The lattice Boltzmann method (LBM) has been established as a flow solver in recent years and was one of the first flow solvers to be successfully ported to GPUs with a performance benefit. We demonstrate advanced optimization strategies for a D3Q19 lattice Boltzmann based incompressible flow solver for GPGPUs and CPUs. Since the implemented algorithm is limited by memory bandwidth, we concentrate on improving memory access. Basic data layout issues for optimal data access are explained and discussed. Furthermore, the algorithmic steps are rearranged to improve scattered access of the GPU memory. The importance of occupancy is discussed as well as optimization strategies to improve overall concurrency. We obtain a well-optimized CPU kernel, which is integrated into a larger framework that can handle single phase fluid flow simulations as well as particle-laden flows. Our 3D LBM GPU implementation reaches up to 650 MLUPS in single precision and 290 MLUPS in double precision on an NVIDIA Tesla C2070 as well as an AMD 6970. (c) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:276 / 282
页数:7
相关论文
共 24 条
[1]  
AMD 6970 GPU, 2011, 6970 AMD GPU
[2]  
Bailey Peter, 2009, Proceedings of the 2009 International Conference on Parallel Processing (ICPP 2009), P550, DOI 10.1109/ICPP.2009.38
[3]  
Bergmann J, 2010, INT CONF SMART GRID, P131, DOI 10.1109/SMARTGRID.2010.5622032
[4]   Lattice Boltzmann method for fluid flows [J].
Chen, S ;
Doolen, GD .
ANNUAL REVIEW OF FLUID MECHANICS, 1998, 30 :329-364
[5]  
Donath S, 2011, COMPUT FLUIDS, V2, P105
[6]   WaLBerla: HPC software design for computational engineering simulations [J].
Feichtinger, C. ;
Donath, S. ;
Koestler, H. ;
Goetz, J. ;
Ruede, U. .
JOURNAL OF COMPUTATIONAL SCIENCE, 2011, 2 (02) :105-112
[7]   A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters [J].
Feichtinger, Christian ;
Habich, Johannes ;
Koestler, Harald ;
Hager, Georg ;
Ruede, Ulrich ;
Wellein, Gerhard .
PARALLEL COMPUTING, 2011, 37 (09) :536-549
[8]   Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA [J].
Habich, J. ;
Zeiser, T. ;
Hager, G. ;
Wellein, G. .
ADVANCES IN ENGINEERING SOFTWARE, 2011, 42 (05) :266-272
[9]   Implementing lattice Boltzmann computation on graphics hardware [J].
Li, W ;
Wei, XM ;
Kaufman, A .
VISUAL COMPUTER, 2003, 19 (7-8) :444-456
[10]  
nVIDIA Cuda Programming Guide 4.0, 2011, NVIDIA CUDA PROGRAMM