Multi-level parallelism for incompressible flow computations on GPU clusters

被引：56

作者：

Jacobsen, Dana A. ^{[1
]}

Senocak, Inanc ^{[2
]}

机构：

[1] Boise State Univ, Dept Comp Sci, Boise, ID 83725 USA

[2] Boise State Univ, Dept Mech & Biomed Engn, Boise, ID 83725 USA

来源：

PARALLEL COMPUTING | 2013年 / 39卷 / 01期

基金：

美国国家科学基金会;

关键词：

GPU; Hybrid MPI-OpenMP-CUDA; Fluid dynamics; MPI; PERFORMANCE;

D O I：

10.1016/j.parco.2012.10.002

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow computations using up to 256 GPUs on a problem with approximately 17.2 billion cells. Our work addresses some of the unique issues faced when merging fine-grain parallelism on the CPU using CUDA with coarse-grain parallelism that use either MPI or MPI-OpenMP for communications. We present three different strategies to overlap computations with communications, and systematically assess their impact on parallel performance on two different CPU clusters. Our results for strong and weak scaling analysis of incompressible flow computations demonstrate that CPU clusters offer significant benefits for large data sets, and a dual-level MPI-CUDA implementation with maximum overlapping of computation and communication provides substantial benefits in performance. We also find that our tri-level MPI-OpenMP-CUDA parallel implementation does not offer a significant advantage in performance over the dual-level implementation on CPU clusters with two GPUs per node, but on clusters with higher CPU counts per node or with different domain decomposition strategies a tri-level implementation may exhibit higher efficiency than a dual-level implementation and needs to be investigated further. (C) 2012 Elsevier B.V. All rights reserved.

引用

页码：1 / 20

页数：20

共 50 条

[21] FDRA: A Framework for a Dynamically Reconfigurable Accelerator Supporting Multi-Level Parallelism
Qiu, Yunhui
Mao, Yiqing
Gao, Xuchen
Chen, Sichao
Li, Jiangnan
Yin, Wenbo
Wang, Lingli
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2024, 17 (01)
[22] Exploiting Multi-Level Parallelism for Stitching Very Large Microscopy Images
Bria, Alessandro
Bernaschi, Massimo
Guarrasi, Massimiliano
Iannello, Giulio
FRONTIERS IN NEUROINFORMATICS, 2019, 13
[23] Multi-level Analysis of GPU Utilization in ML Training Workloads
Delestrac, Paul
Battacharjee, Debjyoti
Yang, Simei
Moolchandani, Diksha
Catthoor, Francky
Torres, Lionel
Novo, David
2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
[24] Multi-level topology for flow visualization
de Leeuw, W
van Liere, R
COMPUTERS & GRAPHICS-UK, 2000, 24 (03): : 325 - 331
[25] Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster
Xian, Wang
Takayuki, Aoki
PARALLEL COMPUTING, 2011, 37 (09) : 521 - 535
[26] Load balancing multi-zone applications on a heterogeneous cluster with multi-level parallelism
Wong, P
Jin, HQ
Becker, J
ISPDC 2004: THIRD INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING/HETEROPAR '04: THIRD INTERNATIONAL WORKSHOP ON ALGORITHMS, MODELS AND TOOLS FOR PARALLEL COMPUTING ON HETEROGENEOUS NETWORKS, PROCEEDINGS, 2004, : 388 - 393
[27] Multi-level Clustering on Metric Spaces Using a Multi-GPU Platform
Barrientos, Ricardo J.
Gomez, Jose I.
Tenllado, Christian
Prieto Matias, Manuel
Zezula, Pavel
EURO-PAR 2013 PARALLEL PROCESSING, 2013, 8097 : 216 - 228
[28] On Performance Study of The Global Arrays Toolkit on Homogeneous Grid Computing Environments: Multi-level Topology-Aware and Multi-level Parallelism
Sirisup, Sirod
U-ruekolan, Suriya
ECTI-CON: 2009 6TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2009, : 664 - +
[29] A multi-level method for data-driven finite element computations
Korzeniowski, Tim Fabian
Weinberg, Kerstin
COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2021, 379 (379)
[30] Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors
Martorell, Xavier
Ayguade, Eduard
Navarro, Nacho
Corbalan, Julita
Gonzalez, Marc
Labarta, Jesus
Proceedings of the International Conference on Supercomputing, 1999, : 294 - 301

← 1 2 3 4 5 →