High-performance SIMD implementation of the lattice-Boltzmann method on the Xeon Phi processor

被引：4

作者：

Robertsen, Fredrik ^{[1
,2
]}

Mattila, Keijo ^{[3
,4
]}

Westerholm, Jan ^{[2
]}

机构：

[1] CSC IT Ctr Sci, POB 405, FI-02101 Espoo, Finland

[2] Abo Akad Univ, Fac Sci & Engn, Vattenborgsvagen 3, FI-20500 Turku, Finland

[3] Univ Jyvaskyla, Fac Informat Technol, Jyvaskyla, Finland

[4] Tampere Univ Technol, Dept Phys, Tampere, Finland

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2019年 / 31卷 / 13期

关键词：

Lattice Boltzmann; prefetching; SIMD; Xeon Phi;

D O I：

10.1002/cpe.5072

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We present a high-performance implementation of the lattice-Boltzmann method (LBM) on the Knights Landing generation of Xeon Phi. The Knights Landing architecture includes 16GB of high-speed memory (MCDRAM) with a reported bandwidth of over 400 GB/s, and a subset of the AVX-512 single instruction multiple data (SIMD) instruction set. We explain five critical implementation aspects for high performance on this architecture: (1) the choice of appropriate LBM algorithm, (2) suitable data layout, (3) vectorization of the computation, (4) data prefetching, and (5) running our LBM simulations exclusively from the MCDRAM. The effects of these implementation aspects on the computational performance are demonstrated with the lattice-Boltzmann scheme involving the D3Q19 discrete velocity set and the TRT collision operator. In our benchmark simulations of fluid flow through porous media, using double-precision floating-point arithmetic, the observed performance exceeds 960 million fluid lattice site updates per second.

引用

页数：16

共 50 条

[31] Data-Oriented Language Implementation of the Lattice-Boltzmann Method for Dense and Sparse Geometries
Tomczak, Tadeusz
APPLIED SCIENCES-BASEL, 2021, 11 (20):
[32] Implementation of a Lattice-Boltzmann method for numerical fluid mechanics using the nVIDIA CUDA technology
Riegel, E.
Indinger, T.
Adams, N. A.
COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2009, 23 (3-4): : 241 - 247
[33] A new GPU implementation for lattice-Boltzmann simulations on sparse geometries
Tomczak, Tadeusz
Szafran, Roman G.
COMPUTER PHYSICS COMMUNICATIONS, 2019, 235 : 258 - 278
[34] Performance Estimation of Lattice Boltzmann Method Implementation in ARUZ
Jablonski, Grzegorz
Kupis, Joanna
PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE MIXED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS - MIXDES 2017, 2017, : 308 - 313
[35] Performance Optimization of Implementation of Lattice Boltzmann Method in ARUZ
Jablonski, Grzegorz
Kupis, Joanna
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE MIXED DESIGN OF INTEGRATED CIRCUITS AND SYSTEM (MIXDES 2018), 2018, : 188 - 191
[36] Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor
Doerfler, Douglas
Deslippe, Jack
Williams, Samuel
Oliker, Leonid
Cook, Brandon
Kurth, Thorsten
Lobet, Mathieu
Malas, Tareq
Vay, Jean-Luc
Vincenti, Henri
HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS, 2016, 9945 : 339 - 353
[37] PIPE - A HIGH-PERFORMANCE VLSI PROCESSOR IMPLEMENTATION
CRAIG, GL
GOODMAN, JR
KATZ, RH
PLESZKUN, AR
RAMACHANDRAN, K
SAYAH, J
SMITH, JE
JOURNAL OF VLSI AND COMPUTER SYSTEMS, 1987, 2 (1-2): : 1 - 22
[38] Performance Prediction of Acoustic Wave Numerical Kernel on Intel Xeon Phi Processor
Martinez, Victor
Serpa, Matheus
Dupros, Fabrice
Padoin, Edson L.
Navaux, Philippe
HIGH PERFORMANCE COMPUTING, 2018, 796 : 101 - 110
[39] Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing
Kanamori, Issaku
Matsufuru, Hideo
2017 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2017, : 375 - 381
[40] High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters
Li, Mingzhe
Hamidouche, Khaled
Lu, Xiaoyi
Lin, Jian
Panda, Dhabaleswar K.
EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 625 - 637

← 1 2 3 4 5 →