Incompressible Fluid Simulation Parallelization with OpenMP, MPI and CUDA

被引:0
|
作者
Jiang, Xuan [1 ]
Lu, Laurence [2 ]
Song, Linyue [3 ]
机构
[1] Univ Calif Berkeley, Civil & Environm Engn Dept, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Elect Engn & Comp Sci Dept, Berkeley, CA USA
[3] Univ Calif Berkeley, Dept Comp Sci, Berkeley, CA USA
来源
ADVANCES IN INFORMATION AND COMMUNICATION, FICC, VOL 2 | 2023年 / 652卷
关键词
OpenMP; MPI; CUDA; Fluid Simulation; Parallel Computation;
D O I
10.1007/978-3-031-28073-3_28
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We note that we base our initial serial implementation off the original code presented in Jos Stam's paper. In the initial implementation, it was easiest to implement OpenMP. Because of the grid-based nature of the solver implementation and the shared-memory nature of OpenMP, the serial implementation did not require the management of mutexes or otherwise any data locks, and the pragmas could be inserted without inducing data races in the code. We also note that due to the Gauss-Seidel method, which in solving a linear system only requires intermediate steps, it is possible to introduce errors that cascade due to relying on neighboring cells which have already been updated. However, this issue is avoidable by looping over every cell in two passes such that each pass constitutes a disjoint checkerboard pattern. To be specific, the set bnd function for enforcing boundary conditions has two main parts, enforcing the edges and the corners, respectively. However, this imposes a strange implementation where we dedicate exactly a single block and a single thread to an additional kernel that resolves the corners, but it's almost not impacting the performance at all and the most time consuming parts of our implementation are cudaMalloc and cudaMemcpy. The only synchronization primitive that this code uses is _ _syncthreads(). We carefully avoided using atomic operations which will be pretty expensive, but we need _ _syncthreads() during the end of diffuse, project and advect because we reset the boundaries of the fluid every time after diffusing and advecting. We also note that similar data races are introduced here without the two passes method mentioned in the previous OpenMP section. Similar to the OpenMP implementation, the pure MPI implementation inherits many of the features of the serial implementation. However, our implementation also performs domain decomposition and the communication necessary. Synchronization is performed through these communication steps, although the local nature of the simulation means that there is no implicit global barrier and much computation can be done almost asynchronously.
引用
收藏
页码:385 / 395
页数:11
相关论文
共 50 条
  • [1] Parallelization of Molecular Dynamics code Comparison of OpenMP, MPI and Cuda implementations
    Duggal, Vibhuti
    Subrahmanyam, D. N. V. R.
    Mishra, Gaurav
    Bhatt, Kislay
    Kalmady, Rajesh
    2013 NATIONAL CONFERENCE ON PARALLEL COMPUTING TECHNOLOGIES (PARCOMPTECH), 2013,
  • [2] Parallelization of the LEMan Code with MPI and OpenMP
    Mellet, N.
    Cooper, W. A.
    ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2009, 5737 : 356 - 362
  • [3] MPI plus OpenMP Parallelization for Elastic Wave Simulation with an Iterative Solver
    Belonosov, Mikhail
    Tcheverda, Vladimir
    Kostin, Victor
    Neklyudov, Dmitry
    EURO-PAR 2019: PARALLEL PROCESSING WORKSHOPS, 2020, 11997 : 709 - 714
  • [4] Hybrid MPI plus OpenMP Parallelization of Scramjet Simulation with Hypergraph Partitioning
    Zeng Yao-yuan
    Zhao Wen-tao
    Wang Zheng-hua
    ADVANCES IN MANUFACTURING SCIENCE AND ENGINEERING, PTS 1-4, 2013, 712-715 : 1294 - +
  • [5] Practical parallelization of scientific applications with OpenMP, OpenACC and MPI
    Aldinucci, Marco
    Cesare, Valentina
    Colonnelli, Iacopo
    Martinelli, Alberto Riccardo
    Mittone, Gianluca
    Cantalupo, Barbara
    Cavazzoni, Carlo
    Drocco, Maurizio
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 157 : 13 - 29
  • [6] Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters
    Yang, Chao-Tung
    Huang, Chih-Lin
    Lin, Cheng-Fang
    COMPUTER PHYSICS COMMUNICATIONS, 2011, 182 (01) : 266 - 269
  • [7] Designing a parallel algorithm for Heat Conduction using MPI, OpenMP and CUDA
    Sivanandan, Vinaya
    Kumar, Vikas
    Meher, Srisai
    2015 NATIONAL CONFERENCE ON PARALLEL COMPUTING TECHNOLOGIES (PARCOMPTECH 2015), 2015,
  • [8] Designing a parallel algorithm for Heat Conduction using MPI, OpenMP and CUDA
    Sivanandan, Vinaya
    Kumar, Vikas
    Meher, Srisai
    2015 IEEE INTERNATIONAL CONFERENCE ON MICROELECTRONICS SYSTEMS EDUCATION (MSE), 2015,
  • [9] Parallelization of Shortest Path Algorithm Using OpenMP and MPI
    Awari, Rajashri
    2017 INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC), 2017, : 304 - 309
  • [10] Scope of MPI/OpenMP/CUDA Parallelization of Harmonic Coupled Finite Strip Method Applied on Large Displacement Stability Analysis of Prismatic Shell Structures
    Hajdukovic, Miroslav
    Milasinovic, Dragan D.
    Nikolic, Milos
    Rakic, Predrag
    Zivanov, Zarko
    Stricevic, Lazar
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2012, 9 (02) : 741 - 761