Massively parallel lattice-Boltzmann codes on large GPU clusters

被引:48
|
作者
Calore, E. [1 ,2 ]
Gabbana, A. [1 ]
Kraus, J. [3 ]
Pellegrini, E. [1 ]
Schifano, S. F. [1 ,2 ]
Tripiccione, R. [1 ,2 ]
机构
[1] Univ Ferrara, Via Saragat 1, I-44122 Ferrara, Italy
[2] INFN Ferrara, Via Saragat 1, I-44122 Ferrara, Italy
[3] NVIDIA GmbH, Adenauerstr 20 A4, D-52146 Wurselen, Germany
关键词
Lattice-Boltzmann; GPU accelerators; Massively parallel programming; Heterogeneous systems; PERFORMANCE; PORTABILITY;
D O I
10.1016/j.parco.2016.08.005
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper describes a massively parallel code for a state -of-the art thermal lattice-Boltzmann method. Our code has been carefully optimized for performance on one GPU and to have a good scaling behavior extending to a large number of GPUs. Versions of this code have been already used for large-scale studies of convective turbulence. GPUs are becoming increasingly popular in HPC applications, as they are able to deliver higher performance than traditional processors. Writing efficient programs for large clusters is not an easy task as codes must adapt to increasingly parallel architectures, and the overheads of node-to-node communications must be properly handled. We describe the structure of our code, discussing several key design choices that were guided by theoretical models of performance and experimental benchmarks. We present an extensive set of performance measurements and identify the corresponding main bottlenecks; finally we compare the results of our GPU code with those measured on other currently available high performance processors. Our results are a production-grade code able to deliver a sustained performance of several tens of Tflops as well as a design and optimization methodology that can be used for the development of other high performance applications for computational physics. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 24
页数:24
相关论文
共 50 条
  • [1] Massively parallel lattice-Boltzmann simulation of turbulent channel flow
    Amati, G
    Succi, S
    Piva, R
    INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 1997, 8 (04): : 869 - 877
  • [2] Lattice-Boltzmann hydrodynamics on parallel systems
    Kandhai, D
    Koponen, A
    Hoekstra, AG
    Kataja, M
    Timonen, J
    Sloot, PMA
    COMPUTER PHYSICS COMMUNICATIONS, 1998, 111 (1-3) : 14 - 26
  • [3] A parallel lattice-Boltzmann method for large scale simulations of complex fluids
    Nekovee, M
    Chin, J
    González-Segredo, N
    Coveney, PV
    COMPUTATIONAL FLUID DYNAMICS, 2001, : 204 - 212
  • [4] Benchmarking GPUs with a parallel Lattice-Boltzmann code
    Kraus, Jiri
    Pivanti, Marcello
    Schifano, Sebastiano Fabio
    Tripiccione, Raffaele
    Zanella, Marco
    2013 25TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 2013, : 160 - 167
  • [5] A new GPU implementation for lattice-Boltzmann simulations on sparse geometries
    Tomczak, Tadeusz
    Szafran, Roman G.
    COMPUTER PHYSICS COMMUNICATIONS, 2019, 235 : 258 - 278
  • [6] LUDWIG: A parallel Lattice-Boltzmann code for complex fluids
    Desplat, JC
    Pagonabarraga, I
    Bladon, P
    COMPUTER PHYSICS COMMUNICATIONS, 2001, 134 (03) : 273 - 290
  • [7] A Lattice-Boltzmann solver for 3D fluid simulation on GPU
    Rinaldi, P. R.
    Dari, E. A.
    Venere, M. J.
    Clausse, A.
    SIMULATION MODELLING PRACTICE AND THEORY, 2012, 25 : 163 - 171
  • [8] Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters
    Myre, J.
    Walsh, S. D. C.
    Lilja, D.
    Saar, M. O.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (04): : 332 - 350
  • [9] A lattice-Boltzmann simulation study of the drag coefficient of clusters of spheres
    Beetstra, R.
    van der Hoef, M. A.
    Kuipers, J. A. M.
    COMPUTERS & FLUIDS, 2006, 35 (8-9) : 966 - 970
  • [10] Parallel fluid flow simulations by means of a lattice-Boltzmann scheme
    Derksen, JJ
    Kooman, JL
    van den Akker, HEA
    HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1997, 1225 : 524 - 530