High-performance SIMD implementation of the lattice-Boltzmann method on the Xeon Phi processor

被引:4
|
作者
Robertsen, Fredrik [1 ,2 ]
Mattila, Keijo [3 ,4 ]
Westerholm, Jan [2 ]
机构
[1] CSC IT Ctr Sci, POB 405, FI-02101 Espoo, Finland
[2] Abo Akad Univ, Fac Sci & Engn, Vattenborgsvagen 3, FI-20500 Turku, Finland
[3] Univ Jyvaskyla, Fac Informat Technol, Jyvaskyla, Finland
[4] Tampere Univ Technol, Dept Phys, Tampere, Finland
来源
关键词
Lattice Boltzmann; prefetching; SIMD; Xeon Phi;
D O I
10.1002/cpe.5072
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a high-performance implementation of the lattice-Boltzmann method (LBM) on the Knights Landing generation of Xeon Phi. The Knights Landing architecture includes 16GB of high-speed memory (MCDRAM) with a reported bandwidth of over 400 GB/s, and a subset of the AVX-512 single instruction multiple data (SIMD) instruction set. We explain five critical implementation aspects for high performance on this architecture: (1) the choice of appropriate LBM algorithm, (2) suitable data layout, (3) vectorization of the computation, (4) data prefetching, and (5) running our LBM simulations exclusively from the MCDRAM. The effects of these implementation aspects on the computational performance are demonstrated with the lattice-Boltzmann scheme involving the D3Q19 discrete velocity set and the TRT collision operator. In our benchmark simulations of fluid flow through porous media, using double-precision floating-point arithmetic, the observed performance exceeds 960 million fluid lattice site updates per second.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Data-Oriented Language Implementation of the Lattice-Boltzmann Method for Dense and Sparse Geometries
    Tomczak, Tadeusz
    APPLIED SCIENCES-BASEL, 2021, 11 (20):
  • [32] Implementation of a Lattice-Boltzmann method for numerical fluid mechanics using the nVIDIA CUDA technology
    Riegel, E.
    Indinger, T.
    Adams, N. A.
    COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2009, 23 (3-4): : 241 - 247
  • [33] A new GPU implementation for lattice-Boltzmann simulations on sparse geometries
    Tomczak, Tadeusz
    Szafran, Roman G.
    COMPUTER PHYSICS COMMUNICATIONS, 2019, 235 : 258 - 278
  • [34] Performance Estimation of Lattice Boltzmann Method Implementation in ARUZ
    Jablonski, Grzegorz
    Kupis, Joanna
    PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE MIXED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS - MIXDES 2017, 2017, : 308 - 313
  • [35] Performance Optimization of Implementation of Lattice Boltzmann Method in ARUZ
    Jablonski, Grzegorz
    Kupis, Joanna
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE MIXED DESIGN OF INTEGRATED CIRCUITS AND SYSTEM (MIXDES 2018), 2018, : 188 - 191
  • [36] Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor
    Doerfler, Douglas
    Deslippe, Jack
    Williams, Samuel
    Oliker, Leonid
    Cook, Brandon
    Kurth, Thorsten
    Lobet, Mathieu
    Malas, Tareq
    Vay, Jean-Luc
    Vincenti, Henri
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS, 2016, 9945 : 339 - 353
  • [37] PIPE - A HIGH-PERFORMANCE VLSI PROCESSOR IMPLEMENTATION
    CRAIG, GL
    GOODMAN, JR
    KATZ, RH
    PLESZKUN, AR
    RAMACHANDRAN, K
    SAYAH, J
    SMITH, JE
    JOURNAL OF VLSI AND COMPUTER SYSTEMS, 1987, 2 (1-2): : 1 - 22
  • [38] Performance Prediction of Acoustic Wave Numerical Kernel on Intel Xeon Phi Processor
    Martinez, Victor
    Serpa, Matheus
    Dupros, Fabrice
    Padoin, Edson L.
    Navaux, Philippe
    HIGH PERFORMANCE COMPUTING, 2018, 796 : 101 - 110
  • [39] Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing
    Kanamori, Issaku
    Matsufuru, Hideo
    2017 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2017, : 375 - 381
  • [40] High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters
    Li, Mingzhe
    Hamidouche, Khaled
    Lu, Xiaoyi
    Lin, Jian
    Panda, Dhabaleswar K.
    EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 625 - 637