High-performance Shallow Water Model for Use on Massively Parallel and Heterogeneous Computing Systems

被引：0

作者：

Chaplygin A.V. ^{[1
]}

Gusev A.V. ^{[1
,2
,3
]}

Diansky N.A. ^{[1
,3
,4
]}

机构：

[1] Marchuk Institute of Numerical Mathematics of the Russian Academy of Sciences, Moscow

[2] P.P. Shirshov Institute of Oceanology of the Russian Academy of Sciences, Moscow

[3] N.N. Zubov State Oceanographic Institute, Moscow

[4] Lomonosov Moscow State University, Moscow

来源：

Supercomputing Frontiers and Innovations | 2021年 / 8卷 / 04期

基金：

俄罗斯基础研究基金会;

关键词：

Cuda; Heterogeneous computing systems; Mpi; Openmp; Shallow water; Supercomputer modeling;

D O I：

10.14529/JSFI210407

中图分类号：

学科分类号：

摘要：

This paper presents the shallow water model, formulated from the ocean general circulation sigma model INMOM (Institute of Numerical Mathematics Ocean Model). The shallow water model is based on software architecture, which separates the physics-related code from parallel implementation features, thereby simplifying the model’s support and development. As an improvement of the two-dimensional domain decomposition method, we present the blocked-based decomposition proposing load-balanced and cache-friendly calculations on CPUs. We propose various hybrid parallel programming patterns in the shallow water model for effective calculation on massively parallel and heterogeneous computing systems and evaluate their scaling performances on the Lomonosov-2 supercomputer. We demonstrate that performance per a single grid point on GPUs dramatically decreases for small grid sizes starting from 219 points per node, while performance on CPUs scales up to 217 well. Although, calculations on GPUs outperform calculations on CPUs by a factor of 4.7 at 30 nodes using 60 GPUs and 360 CPU cores at 6100 × 4460 grid size. We demonstrate that overlapping kernel execution with data transfers on GPUs increases performance by 28%. Furthermore, we demonstrate the advantage of using the load-balancing method in the Azov Sea model on CPUs and GPUs. © The Authors 2021. This paper is published with open access at SuperFri.org

引用

页码：74 / 93

页数：19

共 50 条

[31] Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors
Klenk, Benjamin
Froening, Holger
Eberle, Hans
Dennison, Larry
2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 855 - 865
[32] Massively parallel modular exponentiation method and its implementation in software and hardware for high-performance cryptographic systems
Nedjah, N.
Mourelle, L. M.
Santana, M.
Raposo, S.
IET COMPUTERS AND DIGITAL TECHNIQUES, 2012, 6 (05): : 290 - 301
[33] Comparison of genomes using high-performance parallel computing
Almeida, NF
Alves, CER
Caceres, EN
Song, SW
15TH SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, PROCEEDINGS, 2003, : 142 - 148
[34] High-performance parallel computing for incompressible flow simulations
O. Byrde
W. Couzy
M. O. Deville
M. L. Sawley
Computational Mechanics, 1999, 23 : 98 - 107
[35] High-performance parallel computing for incompressible flow simulations
Fluid Mechanics Laboratory, Ecl. Polytech. Federale de Lausanne, ME-Ecublens, CH-1015 Lausanne, Switzerland
Comput Mech, 2 (98-107):
[36] The FPGA High-Performance Computing Alliance Parallel Toolkit
Baxter, Rob
Booth, Stephen
Bull, Mark
Cawood, Geoff
Perry, James
Parsons, Mark
Simpson, Alan
Trew, Arthur
McCormick, Andrew
Smart, Graham
Smart, Ronnie
Cantle, Allan
Chamberlain, Richard
Genest, Gildas
NASA/ESA CONFERENCE ON ADAPTIVE HARDWARE AND SYSTEMS, PROCEEDINGS, 2007, : 301 - +
[37] Parallel language processing system for high-performance computing
Yamanaka, E
Shindo, T
FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 1997, 33 (01): : 39 - 51
[38] Parallel language processing system for high-performance computing
Yamanaka, Eiji
Shindo, Tatsuya
Fujitsu Scientific and Technical Journal, 1997, 33 (01): : 39 - 51
[39] High-performance parallel computing for incompressible flow simulations
Byrde, O
Couzy, W
Deville, MO
Sawley, ML
COMPUTATIONAL MECHANICS, 1999, 23 (02) : 98 - 107
[40] High-performance parallel computing for stiffness equation of FEM
Nippon Kikai Gakkai Ronbunshu A Hen, 603 (2468-2473):

← 1 2 3 4 5 →