Incompressible Fluid Simulation Parallelization with OpenMP, MPI and CUDA

被引:0
|
作者
Jiang, Xuan [1 ]
Lu, Laurence [2 ]
Song, Linyue [3 ]
机构
[1] Univ Calif Berkeley, Civil & Environm Engn Dept, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Elect Engn & Comp Sci Dept, Berkeley, CA USA
[3] Univ Calif Berkeley, Dept Comp Sci, Berkeley, CA USA
来源
ADVANCES IN INFORMATION AND COMMUNICATION, FICC, VOL 2 | 2023年 / 652卷
关键词
OpenMP; MPI; CUDA; Fluid Simulation; Parallel Computation;
D O I
10.1007/978-3-031-28073-3_28
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We note that we base our initial serial implementation off the original code presented in Jos Stam's paper. In the initial implementation, it was easiest to implement OpenMP. Because of the grid-based nature of the solver implementation and the shared-memory nature of OpenMP, the serial implementation did not require the management of mutexes or otherwise any data locks, and the pragmas could be inserted without inducing data races in the code. We also note that due to the Gauss-Seidel method, which in solving a linear system only requires intermediate steps, it is possible to introduce errors that cascade due to relying on neighboring cells which have already been updated. However, this issue is avoidable by looping over every cell in two passes such that each pass constitutes a disjoint checkerboard pattern. To be specific, the set bnd function for enforcing boundary conditions has two main parts, enforcing the edges and the corners, respectively. However, this imposes a strange implementation where we dedicate exactly a single block and a single thread to an additional kernel that resolves the corners, but it's almost not impacting the performance at all and the most time consuming parts of our implementation are cudaMalloc and cudaMemcpy. The only synchronization primitive that this code uses is _ _syncthreads(). We carefully avoided using atomic operations which will be pretty expensive, but we need _ _syncthreads() during the end of diffuse, project and advect because we reset the boundaries of the fluid every time after diffusing and advecting. We also note that similar data races are introduced here without the two passes method mentioned in the previous OpenMP section. Similar to the OpenMP implementation, the pure MPI implementation inherits many of the features of the serial implementation. However, our implementation also performs domain decomposition and the communication necessary. Synchronization is performed through these communication steps, although the local nature of the simulation means that there is no implicit global barrier and much computation can be done almost asynchronously.
引用
收藏
页码:385 / 395
页数:11
相关论文
共 50 条
  • [31] MPI Correctness Checking for OpenMP/MPI Applications
    Tobias Hilbrich
    Matthias S. Müller
    Bettina Krammer
    International Journal of Parallel Programming, 2009, 37 : 277 - 291
  • [32] EXTENSION WITH OPENCL OF THE TWO-LEVEL MPI+OPENMP PARALLELIZATION FOR CFD SIMULATIONS ON HETEROGENEOUS SYSTEMS
    Gorobets, A. V.
    Soukov, S. A.
    Bogdanov, P. B.
    Zheleznyakov, A. O.
    Chetverushkin, B. N.
    BULLETIN OF THE SOUTH URAL STATE UNIVERSITY SERIES-MATHEMATICAL MODELLING PROGRAMMING & COMPUTER SOFTWARE, 2011, (09): : 76 - 86
  • [33] MPI-CUDA parallelization of a finite-strip program for geometric nonlinear analysis: A hybrid approach
    Rakic, P. S.
    Milasinovic, D. D.
    Zivanov, Z.
    Suvajdzin, Z.
    Nikolic, M.
    Hajdukovic, M.
    ADVANCES IN ENGINEERING SOFTWARE, 2011, 42 (05) : 273 - 285
  • [34] MPI/OpenMP-Based Parallel Solver for Imprint Forming Simulation
    Li, Yang
    Xu, Jiangping
    Liu, Yun
    Zhong, Wen
    Wang, Fei
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2024, 140 (01): : 461 - 483
  • [35] Hybrid MPI/OpenMP Parallelization of the Effective Fragment Potential Method in the libefp Software Library
    Kaliman, Ilya A.
    Slipchenko, Lyudmila V.
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2015, 36 (02) : 129 - 135
  • [36] High-scalability parallelization of a molecular modeling application: Performance and productivity comparison between OpenMP and MPI implementations
    Brown, Russell
    Sharapov, Ilya
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2007, 35 (05) : 441 - 458
  • [37] High-Scalability Parallelization of a Molecular Modeling Application: Performance and Productivity Comparison Between OpenMP and MPI Implementations
    Russell Brown
    Ilya Sharapov
    International Journal of Parallel Programming, 2007, 35 : 441 - 458
  • [38] Fast and Accurate Solution of Integral Formulations of Large MQS Problems Based on Hybrid OpenMP-MPI Parallelization
    Ventre, Salvatore
    Cau, Francesca
    Chiariello, Andrea
    Giovinco, Gaspare
    Maffucci, Antonio
    Villone, Fabio
    APPLIED SCIENCES-BASEL, 2022, 12 (02):
  • [39] Spatiotemporal parallelization of an analytical heat conduction model for additive manufacturing via a hybrid OpenMP plus MPI approach
    Stump, B.
    Plotkowski, A.
    COMPUTATIONAL MATERIALS SCIENCE, 2020, 184
  • [40] An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel® Xeon Phi™ processor
    Mironov, Vladimir
    Alexeev, Yuri
    Keipert, Kristopher
    D'mello, Michael
    Moskovsky, Alexander
    Gordon, Mark S.
    SC'17: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2017,