Multi-level parallelism for incompressible flow computations on GPU clusters

被引：56

作者：

Jacobsen, Dana A. ^{[1
]}

Senocak, Inanc ^{[2
]}

机构：

[1] Boise State Univ, Dept Comp Sci, Boise, ID 83725 USA

[2] Boise State Univ, Dept Mech & Biomed Engn, Boise, ID 83725 USA

来源：

PARALLEL COMPUTING | 2013年 / 39卷 / 01期

基金：

美国国家科学基金会;

关键词：

GPU; Hybrid MPI-OpenMP-CUDA; Fluid dynamics; MPI; PERFORMANCE;

D O I：

10.1016/j.parco.2012.10.002

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow computations using up to 256 GPUs on a problem with approximately 17.2 billion cells. Our work addresses some of the unique issues faced when merging fine-grain parallelism on the CPU using CUDA with coarse-grain parallelism that use either MPI or MPI-OpenMP for communications. We present three different strategies to overlap computations with communications, and systematically assess their impact on parallel performance on two different CPU clusters. Our results for strong and weak scaling analysis of incompressible flow computations demonstrate that CPU clusters offer significant benefits for large data sets, and a dual-level MPI-CUDA implementation with maximum overlapping of computation and communication provides substantial benefits in performance. We also find that our tri-level MPI-OpenMP-CUDA parallel implementation does not offer a significant advantage in performance over the dual-level implementation on CPU clusters with two GPUs per node, but on clusters with higher CPU counts per node or with different domain decomposition strategies a tri-level implementation may exhibit higher efficiency than a dual-level implementation and needs to be investigated further. (C) 2012 Elsevier B.V. All rights reserved.

引用

页码：1 / 20

页数：20

共 50 条

[41] Flow control of multi-level assembly systems
Haouba, Ahmedou
Xie, Xiaolan
International Journal of Computer Integrated Manufacturing, 12 (01): : 84 - 95
[42] Nested Parallelism on GPU: Exploring Parallelization Templates for Irregular Loops and Recursive Computations
Li, Da
Wu, Hancheng
Becchi, Michela
2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2015, : 979 - 988
[43] MULTI-LEVEL PROGRAMMING - 1ST REPORT ON MODEL AND ON EXPERIMENTAL COMPUTATIONS
KORNAI, J
EUROPEAN ECONOMIC REVIEW, 1969, 1 (01) : 134 - 191
[44] A Multi-Level Platform-Independent GPU API for High-Level Programming Models
Hayashi, Akihiro
Paul, Sri Raj
Sarkar, Vivek
HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2022 INTERNATIONAL WORKSHOPS, 2022, 13387 : 90 - 107
[45] Multi-level nature of and multi-level approaches to leadership
Yammarino, Francis J.
Dansereau, Fred
LEADERSHIP QUARTERLY, 2008, 19 (02): : 135 - 141
[46] Optimizing Metaheuristics and Hyperheuristics through Multi-level Parallelism on a Many-core System
Cutillas-Lozano, Jose-Matias
Gimenez, Domingo
Garcia, Luis-Pedro
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 786 - 795
[47] Unsteady incompressible flow computations with quadrilateral elements
1990, Publ by Soc for Industrial & Applied Mathematics Publ, Philadelphia, PA, USA
[48] Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU
Nagasaka, Yusuke
Nukada, Akira
Matsuoka, Satoshi
INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 131 - 142
[49] Parallel Gene Upstream Comparison via Multi-Level Hash Tables on GPU
Todd, Andrew
Truong, Huan
Deters, Justin
Long, John
Conant, Gavin
Becchi, Michela
2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 1049 - 1058
[50] Efficient implementation of data flow graphs on multi-gpu clusters
Vincent Boulos
Sylvain Huet
Vincent Fristot
Luc Salvo
Dominique Houzet
Journal of Real-Time Image Processing, 2014, 9 : 217 - 232

← 1 2 3 4 5 →