A General Design for a Scalable MPI-GPU Multi-Resolution 2D Numerical Solver

被引:9
作者
Turchetto, Massimiliano [1 ]
Palu, Alessandro Dal [2 ]
Vacondio, Renato [1 ]
机构
[1] Univ Parma, Engn & Architecture Dept, I-43121 Parma, Italy
[2] Univ Parma, Math Phys Comp Sci Dept, I-43121 Parma, Italy
关键词
CUDA; multi-GPU; MPI; dynamic load balancing; hilbert space filling curves; multi-resolution grid; shallow water equations (SWE); AMR; ADAPTIVE MESH REFINEMENT; PARALLEL; IMPLEMENTATION; CODE;
D O I
10.1109/TPDS.2019.2961909
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This article presents a multi-GPU implementation of a Finite-Volume solver on a multi-resolution grid. The implementation completely offloads the computation to the GPUs and communications between different GPUs are implemented by means of the Message Passing Interface (MPI) API. Different domain decomposition techniques have been considered and the one based on the Hilbert Space Filling Curves (HSFC) showed optimal scalability. Several optimizations are introduced: One-to-one MPI communications among MPI ranks are completely masked by GPU computations on internal cells and a novel dynamic load balancing algorithm is introduced to minimize the waiting times at global MPI synchronization barriers. Such algorithm adapts the computational load of ranks in response to dynamical changes in the execution time of blocks and in network performances; Its capability to converge to a balanced computation has been empirically shown by numerical experiments. Tests exploit up to 64 GPUs and 83M cells and achieve an efficiency of 90 percent in weak scalability and 85 percent for strong scalability. The framework is general and the results of the article can be ported to a wide range of explicit 2D Partial Differential Equations solvers.
引用
收藏
页码:1036 / 1047
页数:12
相关论文
共 23 条
[1]  
Anderson R., 2013, Tech. Rep. LLNL-SM-617092-DRAFT
[2]  
[Anonymous], MESS PASS INT MPI FO
[3]   ENZO: AN ADAPTIVE MESH REFINEMENT CODE FOR ASTROPHYSICS [J].
Bryan, Greg L. ;
Norman, Michael L. ;
O'Shea, Brian W. ;
Abel, Tom ;
Wise, John H. ;
Turk, Matthew J. ;
Reynolds, Daniel R. ;
Collins, David C. ;
Wang, Peng ;
Skillman, Samuel W. ;
Smith, Britton ;
Harkness, Robert P. ;
Bordner, James ;
Kim, Ji-hoon ;
Kuhlen, Michael ;
Xu, Hao ;
Goldbaum, Nathan ;
Hummels, Cameron ;
Kritsuk, Alexei G. ;
Tasker, Elizabeth ;
Skory, Stephen ;
Simpson, Christine M. ;
Hahn, Oliver ;
Oishi, Jeffrey S. ;
So, Geoffrey C. ;
Zhao, Fen ;
Cen, Renyue ;
Li, Yuan .
ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2014, 211 (02)
[4]   p4est: SCALABLE ALGORITHMS FOR PARALLEL ADAPTIVE MESH REFINEMENT ON FORESTS OF OCTREES [J].
Burstedde, Carsten ;
Wilcox, Lucas C. ;
Ghattas, Omar .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2011, 33 (03) :1103-1133
[5]  
Colella P., 2009, Chombo Software Package for AMR Applications: Design Document
[6]   An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems [J].
de la Asuncion, Marc ;
Mantas, Jose M. ;
Castro, Manuel J. ;
Fernandez-Nieto, E. D. .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2012, 72 (09) :1065-1072
[7]   New multi-GPU implementation for smoothed particle hydrodynamics on heterogeneous clusters [J].
Dominguez, J. M. ;
Crespo, A. J. C. ;
Valdez-Balderas, D. ;
Rogers, B. D. ;
Gomez-Gesteira, M. .
COMPUTER PHYSICS COMMUNICATIONS, 2013, 184 (08) :1848-1860
[8]  
Garcia-Gasulla M., 2017, INT J HPC APPL
[9]   Dynamic load balance applied to particle transport in fluids [J].
Houzeaux, Guillaume ;
Garcia, Marta ;
Carlos Cajas, Juan ;
Artigues, Antoni ;
Olivares, Edgar ;
Labarta, Jesus ;
Vazquez, Mariano .
INTERNATIONAL JOURNAL OF COMPUTATIONAL FLUID DYNAMICS, 2016, 30 (06) :408-418
[10]  
Jacobsen D., 2010, 48 AIAA AEROSPACE SC, P522, DOI [10.2514/6.2010-522, DOI 10.2514/6.2010-522]