Multi-level parallelism for incompressible flow computations on GPU clusters

被引:56
|
作者
Jacobsen, Dana A. [1 ]
Senocak, Inanc [2 ]
机构
[1] Boise State Univ, Dept Comp Sci, Boise, ID 83725 USA
[2] Boise State Univ, Dept Mech & Biomed Engn, Boise, ID 83725 USA
基金
美国国家科学基金会;
关键词
GPU; Hybrid MPI-OpenMP-CUDA; Fluid dynamics; MPI; PERFORMANCE;
D O I
10.1016/j.parco.2012.10.002
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow computations using up to 256 GPUs on a problem with approximately 17.2 billion cells. Our work addresses some of the unique issues faced when merging fine-grain parallelism on the CPU using CUDA with coarse-grain parallelism that use either MPI or MPI-OpenMP for communications. We present three different strategies to overlap computations with communications, and systematically assess their impact on parallel performance on two different CPU clusters. Our results for strong and weak scaling analysis of incompressible flow computations demonstrate that CPU clusters offer significant benefits for large data sets, and a dual-level MPI-CUDA implementation with maximum overlapping of computation and communication provides substantial benefits in performance. We also find that our tri-level MPI-OpenMP-CUDA parallel implementation does not offer a significant advantage in performance over the dual-level implementation on CPU clusters with two GPUs per node, but on clusters with higher CPU counts per node or with different domain decomposition strategies a tri-level implementation may exhibit higher efficiency than a dual-level implementation and needs to be investigated further. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 20
页数:20
相关论文
共 50 条
  • [41] Flow control of multi-level assembly systems
    Haouba, Ahmedou
    Xie, Xiaolan
    International Journal of Computer Integrated Manufacturing, 12 (01): : 84 - 95
  • [42] Nested Parallelism on GPU: Exploring Parallelization Templates for Irregular Loops and Recursive Computations
    Li, Da
    Wu, Hancheng
    Becchi, Michela
    2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2015, : 979 - 988
  • [43] MULTI-LEVEL PROGRAMMING - 1ST REPORT ON MODEL AND ON EXPERIMENTAL COMPUTATIONS
    KORNAI, J
    EUROPEAN ECONOMIC REVIEW, 1969, 1 (01) : 134 - 191
  • [44] A Multi-Level Platform-Independent GPU API for High-Level Programming Models
    Hayashi, Akihiro
    Paul, Sri Raj
    Sarkar, Vivek
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2022 INTERNATIONAL WORKSHOPS, 2022, 13387 : 90 - 107
  • [45] Multi-level nature of and multi-level approaches to leadership
    Yammarino, Francis J.
    Dansereau, Fred
    LEADERSHIP QUARTERLY, 2008, 19 (02): : 135 - 141
  • [46] Optimizing Metaheuristics and Hyperheuristics through Multi-level Parallelism on a Many-core System
    Cutillas-Lozano, Jose-Matias
    Gimenez, Domingo
    Garcia, Luis-Pedro
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 786 - 795
  • [47] Unsteady incompressible flow computations with quadrilateral elements
    1990, Publ by Soc for Industrial & Applied Mathematics Publ, Philadelphia, PA, USA
  • [48] Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU
    Nagasaka, Yusuke
    Nukada, Akira
    Matsuoka, Satoshi
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 131 - 142
  • [49] Parallel Gene Upstream Comparison via Multi-Level Hash Tables on GPU
    Todd, Andrew
    Truong, Huan
    Deters, Justin
    Long, John
    Conant, Gavin
    Becchi, Michela
    2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 1049 - 1058
  • [50] Efficient implementation of data flow graphs on multi-gpu clusters
    Vincent Boulos
    Sylvain Huet
    Vincent Fristot
    Luc Salvo
    Dominique Houzet
    Journal of Real-Time Image Processing, 2014, 9 : 217 - 232