Multi-level parallelism for incompressible flow computations on GPU clusters

被引:56
|
作者
Jacobsen, Dana A. [1 ]
Senocak, Inanc [2 ]
机构
[1] Boise State Univ, Dept Comp Sci, Boise, ID 83725 USA
[2] Boise State Univ, Dept Mech & Biomed Engn, Boise, ID 83725 USA
基金
美国国家科学基金会;
关键词
GPU; Hybrid MPI-OpenMP-CUDA; Fluid dynamics; MPI; PERFORMANCE;
D O I
10.1016/j.parco.2012.10.002
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow computations using up to 256 GPUs on a problem with approximately 17.2 billion cells. Our work addresses some of the unique issues faced when merging fine-grain parallelism on the CPU using CUDA with coarse-grain parallelism that use either MPI or MPI-OpenMP for communications. We present three different strategies to overlap computations with communications, and systematically assess their impact on parallel performance on two different CPU clusters. Our results for strong and weak scaling analysis of incompressible flow computations demonstrate that CPU clusters offer significant benefits for large data sets, and a dual-level MPI-CUDA implementation with maximum overlapping of computation and communication provides substantial benefits in performance. We also find that our tri-level MPI-OpenMP-CUDA parallel implementation does not offer a significant advantage in performance over the dual-level implementation on CPU clusters with two GPUs per node, but on clusters with higher CPU counts per node or with different domain decomposition strategies a tri-level implementation may exhibit higher efficiency than a dual-level implementation and needs to be investigated further. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 20
页数:20
相关论文
共 50 条
  • [21] FDRA: A Framework for a Dynamically Reconfigurable Accelerator Supporting Multi-Level Parallelism
    Qiu, Yunhui
    Mao, Yiqing
    Gao, Xuchen
    Chen, Sichao
    Li, Jiangnan
    Yin, Wenbo
    Wang, Lingli
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2024, 17 (01)
  • [22] Exploiting Multi-Level Parallelism for Stitching Very Large Microscopy Images
    Bria, Alessandro
    Bernaschi, Massimo
    Guarrasi, Massimiliano
    Iannello, Giulio
    FRONTIERS IN NEUROINFORMATICS, 2019, 13
  • [23] Multi-level Analysis of GPU Utilization in ML Training Workloads
    Delestrac, Paul
    Battacharjee, Debjyoti
    Yang, Simei
    Moolchandani, Diksha
    Catthoor, Francky
    Torres, Lionel
    Novo, David
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
  • [24] Multi-level topology for flow visualization
    de Leeuw, W
    van Liere, R
    COMPUTERS & GRAPHICS-UK, 2000, 24 (03): : 325 - 331
  • [25] Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster
    Xian, Wang
    Takayuki, Aoki
    PARALLEL COMPUTING, 2011, 37 (09) : 521 - 535
  • [26] Load balancing multi-zone applications on a heterogeneous cluster with multi-level parallelism
    Wong, P
    Jin, HQ
    Becker, J
    ISPDC 2004: THIRD INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING/HETEROPAR '04: THIRD INTERNATIONAL WORKSHOP ON ALGORITHMS, MODELS AND TOOLS FOR PARALLEL COMPUTING ON HETEROGENEOUS NETWORKS, PROCEEDINGS, 2004, : 388 - 393
  • [27] Multi-level Clustering on Metric Spaces Using a Multi-GPU Platform
    Barrientos, Ricardo J.
    Gomez, Jose I.
    Tenllado, Christian
    Prieto Matias, Manuel
    Zezula, Pavel
    EURO-PAR 2013 PARALLEL PROCESSING, 2013, 8097 : 216 - 228
  • [28] On Performance Study of The Global Arrays Toolkit on Homogeneous Grid Computing Environments: Multi-level Topology-Aware and Multi-level Parallelism
    Sirisup, Sirod
    U-ruekolan, Suriya
    ECTI-CON: 2009 6TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2009, : 664 - +
  • [29] A multi-level method for data-driven finite element computations
    Korzeniowski, Tim Fabian
    Weinberg, Kerstin
    COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2021, 379 (379)
  • [30] Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors
    Martorell, Xavier
    Ayguade, Eduard
    Navarro, Nacho
    Corbalan, Julita
    Gonzalez, Marc
    Labarta, Jesus
    Proceedings of the International Conference on Supercomputing, 1999, : 294 - 301