Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster

被引:139
|
作者
Xian, Wang [1 ]
Takayuki, Aoki [1 ]
机构
[1] Tokyo Inst Technol, Global Sci Informat & Comp Ctr, Tokyo 1528550, Japan
基金
日本科学技术振兴机构;
关键词
GPU; Lattice Boltzmann method; Multi-node GPU cluster; Parallel; Data communication; Domain partitioning; Overlapping mode; Large-scaled;
D O I
10.1016/j.parco.2011.02.007
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
GPGPU has drawn much attention on accelerating non-graphic applications. The simulation by D3Q19 model of the lattice Boltzmann method was executed successfully on multi-node GPU cluster by using CUDA programming and MPI library. The GPU code runs on the multi-node GPU cluster TSUBAME of Tokyo Institute of Technology, in which a total of 680 GPUs of NVIDIA Tesla are equipped. For multi-GPU computation, domain partitioning method is used to distribute computational load to multiple GPUs and GPU-to-GPU data transfer becomes severe overhead for the total performance. Comparison and analysis were made among the parallel results by 1D, 2D and 3D domain partitionings. As a result, with 384 x 384 x 384 mesh system and 96 GPUs, the performance by 3D partitioning is about 3-4 times higher than that by 1D partitioning. The performance curve is deviated from the idealistic line due to the long communicational time between GPUs. In order to hide the communication time, we introduced the overlapping technique between computation and communication, in which the data transfer process and computation were done in two streams simultaneously. Using 8-96 GPUs, the performances increase by a factor about 1.1-1.3 with a overlapping mode. As a benchmark problem, a large-scaled computation of a flow around a sphere at Re = 13,000 was carried on successfully using the mesh system 2000 x 1000 x 1000 and 100 GPUs. For such a computation with 2 Giga lattice nodes, 6.0 h were used for processing 100,000 time steps. Under this condition, the computational time (2.79 h) and the data communication time (3.06 h) are almost the same. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:521 / 535
页数:15
相关论文
共 50 条
  • [1] Multi-GPU implementation of the lattice Boltzmann method
    Obrecht, Christian
    Kuznik, Frederic
    Tourancheau, Bernard
    Roux, Jean-Jacques
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2013, 65 (02) : 252 - 261
  • [2] Sailfish: A flexible multi-GPU implementation of the lattice Boltzmann method
    Januszewski, M.
    Kostur, M.
    COMPUTER PHYSICS COMMUNICATIONS, 2014, 185 (09) : 2350 - 2368
  • [3] The TheLMA project: Multi-GPU implementation of the lattice Boltzmann method
    Obrecht, Christian
    Kuznik, Frederic
    Tourancheau, Bernard
    Roux, Jean-Jacques
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2011, 25 (03): : 295 - 303
  • [4] Implementation of Multi-GPU Based Lattice Boltzmann Method for Flow Through Porous Media
    Huang, Changsheng
    Shi, Baochang
    He, Nanzhong
    Chai, Zhenhua
    ADVANCES IN APPLIED MATHEMATICS AND MECHANICS, 2015, 7 (01) : 1 - 12
  • [5] Scalable multi-relaxation-time lattice Boltzmann simulations on multi-GPU cluster
    Hong, Pei-Yao
    Huang, Li-Min
    Lin, Li-Song
    Lin, Chao-An
    COMPUTERS & FLUIDS, 2015, 110 : 1 - 8
  • [6] Optimizing Communications in multi-GPU Lattice Boltzmann Simulations
    Calore, Enrico
    Marchi, Davide
    Schifano, Sebastiano Fabio
    Tripiccione, Raffaele
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2015), 2015, : 55 - 62
  • [7] GPU Accelerated Blood Flow Computation using the Lattice Boltzmann Method
    Nita, Cosmin
    Itu, Lucian Mihai
    Suciu, Constantin
    Suciu, Constantin
    2013 IEEE CONFERENCE ON HIGH PERFORMANCE EXTREME COMPUTING (HPEC), 2013,
  • [8] Adjoint Lattice Boltzmann for topology optimization on multi-GPU architecture
    Laniewski-Wollk, L.
    Rokicki, J.
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2016, 71 (03) : 833 - 848
  • [9] A parallel nonlinear multigrid solver for unsteady incompressible flow simulation on multi-GPU cluster
    Shi, Xiaolei
    Agrawal, Tanmay
    Lin, Chao-An
    Hwang, Feng-Nan
    Chiu, Tzu-Hsuan
    JOURNAL OF COMPUTATIONAL PHYSICS, 2020, 414
  • [10] Simulations of turbulent duct flow with lattice Boltzmann method on GPU cluster
    Lee, You-Hsun
    Huang, Li-Min
    Zou, You-Seng
    Huang, Shao-Ching
    Lin, Chao-An
    COMPUTERS & FLUIDS, 2018, 168 : 14 - 20