Incompressible Fluid Simulation Parallelization with OpenMP, MPI and CUDA

被引：0

作者：

Jiang, Xuan ^{[1
]}

Lu, Laurence ^{[2
]}

Song, Linyue ^{[3
]}

机构：

[1] Univ Calif Berkeley, Civil & Environm Engn Dept, Berkeley, CA 94720 USA

[2] Univ Calif Berkeley, Elect Engn & Comp Sci Dept, Berkeley, CA USA

[3] Univ Calif Berkeley, Dept Comp Sci, Berkeley, CA USA

来源：

ADVANCES IN INFORMATION AND COMMUNICATION, FICC, VOL 2 | 2023年 / 652卷

关键词：

OpenMP; MPI; CUDA; Fluid Simulation; Parallel Computation;

D O I：

10.1007/978-3-031-28073-3_28

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We note that we base our initial serial implementation off the original code presented in Jos Stam's paper. In the initial implementation, it was easiest to implement OpenMP. Because of the grid-based nature of the solver implementation and the shared-memory nature of OpenMP, the serial implementation did not require the management of mutexes or otherwise any data locks, and the pragmas could be inserted without inducing data races in the code. We also note that due to the Gauss-Seidel method, which in solving a linear system only requires intermediate steps, it is possible to introduce errors that cascade due to relying on neighboring cells which have already been updated. However, this issue is avoidable by looping over every cell in two passes such that each pass constitutes a disjoint checkerboard pattern. To be specific, the set bnd function for enforcing boundary conditions has two main parts, enforcing the edges and the corners, respectively. However, this imposes a strange implementation where we dedicate exactly a single block and a single thread to an additional kernel that resolves the corners, but it's almost not impacting the performance at all and the most time consuming parts of our implementation are cudaMalloc and cudaMemcpy. The only synchronization primitive that this code uses is _ _syncthreads(). We carefully avoided using atomic operations which will be pretty expensive, but we need _ _syncthreads() during the end of diffuse, project and advect because we reset the boundaries of the fluid every time after diffusing and advecting. We also note that similar data races are introduced here without the two passes method mentioned in the previous OpenMP section. Similar to the OpenMP implementation, the pure MPI implementation inherits many of the features of the serial implementation. However, our implementation also performs domain decomposition and the communication necessary. Synchronization is performed through these communication steps, although the local nature of the simulation means that there is no implicit global barrier and much computation can be done almost asynchronously.

引用

页码：385 / 395

页数：11

共 50 条

[41] A hybrid CUDA, OpenMP, and MPI parallel TCA-based domain adaptation for classification of very high-resolution remote sensing images
Alberto S. Garea
Dora B. Heras
Francisco Argüello
Begüm Demir
The Journal of Supercomputing, 2023, 79 : 7513 - 7532
[42] Accelerating Data Mining with CUDA and OpenMP
Al-Hamoudi, Adwa S.
Biyabani, A. Ahmed
2014 IEEE/ACS 11TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2014, : 528 - 535
[43] A hybrid CUDA, OpenMP, and MPI parallel TCA-based domain adaptation for classification of very high-resolution remote sensing images
Garea, Alberto S.
Heras, Dora B.
Arguello, Francisco
Demir, Begum
JOURNAL OF SUPERCOMPUTING, 2023, 79 (07) : 7513 - 7532
[44] MPI parallelization of PIC simulation with Adaptive Mesh Refinement
Matsui, Tatsuki
Usui, Hideyuki
Moritaka, Toseo
Nunami, Masanori
PROCEEDINGS OF THE 19TH INTERNATIONAL EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING, 2011, : 277 - 281
[45] Nested Parallelization with OpenMP
Dieter an Mey
Samuel Sarholz
Christian Terboven
International Journal of Parallel Programming, 2007, 35 : 459 - 476
[46] Nested parallelization with OpenMP
Mey, Dieter an
Sarholz, Samuel
Terboven, Christian
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2007, 35 (05) : 459 - 476
[47] Implementing OpenMP for clusters on top of MPI
Dorta, AJ
Badía, JM
Quintana, ES
de Sande, F
RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, PROCEEDINGS, 2005, 3666 : 148 - 155
[48] A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence
Mininni, Pablo D.
Rosenberg, Duane
Reddy, Raghu
Pouquet, Annick
PARALLEL COMPUTING, 2011, 37 (6-7) : 316 - 326
[49] MPI plus OpenMP Tasking Scalability for the Simulation of the Human Brain Human Brain Project
Valero-Lara, Pedro
Sirvent, Raul
Pena, Antonio J.
Martorell, Xavier
Labarta, Jesus
EUROMPI 2018: PROCEEDINGS OF THE 25TH EUROPEAN MPI USERS' GROUP MEETING, 2018,
[50] Parallel Simulation of Magnetic Targeting of Nano-Carriers in Capillary using OpenMP and MPI
Hournkumnuard, Kanok
Dolwithayakul, Banpot
Chantrapornchai, Chantana
2013 INTERNATIONAL COMPUTER SCIENCE AND ENGINEERING CONFERENCE (ICSEC), 2013, : 47 - 52

← 1 2 3 4 5 →