Accelerating atmospheric physics parameterizations using graphics processing units

被引:0
作者
Abdi, Daniel S. [1 ,3 ]
Jankov, Isidora [2 ]
机构
[1] Univ Colorado Boulder, CIRES, Boulder, CO USA
[2] NOAA, Global Syst Lab, Boulder, CO USA
[3] Univ Colorado Boulder, CIRES, 1665 Cent Campus Mall 216 UCB, Boulder, CO 80309 USA
关键词
Common community physics package; atmospheric physics; atmospheric model; graphics processing units acceleration; MICROPHYSICS SCHEME; EXPLICIT FORECASTS; MODEL; RESOLUTION; WEATHER; GPUS;
D O I
10.1177/10943420241238711
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As part of a project aimed at exploring the use of next-generation high-performance computing technologies for numerical weather prediction, we have ported two physics modules from the Common Community Physics Package (CCPP) to Graphics Processing Unit (GPU) and obtained accelerations of up to 10x relative to a comparable multi-core CPU. The physics parameterizations accelerated in this work are the aerosol-aware Thompson microphysics (TH) scheme and the Grell-Freitas (GF) cumulus convection scheme. Microphysics schemes are among the most time-consuming physics parameterizations, second to only radiative process schemes, and our results show better acceleration for the TH scheme than the GF scheme. Multi-GPU implementations of the schemes show acceptable weak scaling in a single node with 8 GPUs, and perfect weak scaling on multiple nodes using one GPU per node. The lack of inter-node communication for column physics parameterizations contributes to their scalability, however, physics parameterizations are run along with dynamics, so the overall multi-GPU performance is often governed by the latter. In the context of optimizing CCPP physics modules, our observations underscore that the extensive use of automatic arrays within inner subroutines hampers GPU performance due to serialized memory allocations. We have used the OpenACC directive programming language for this work because it allows for easy porting of large amounts of code and makes code maintenance more manageable compared to low-level languages like CUDA and OpenCL.
引用
收藏
页码:282 / 296
页数:15
相关论文
共 34 条
[1]  
Abdi D., 2018, 8 NCAR MULTICORE WOR
[2]  
Abdi D., 2018, 2018 JOINT WRFMPAS U
[3]   Acceleration of the IMplicit-EXplicit nonhydrostatic unified model of the atmosphere on manycore processors [J].
Abdi, Daniel S. ;
Giraldo, Francis X. ;
Constantinescu, Emil M. ;
Carr, Lester E. ;
Wilcox, Lucas C. ;
Warburton, Timothy C. .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2019, 33 (02) :242-267
[4]   A GPU-accelerated continuous and discontinuous Galerkin non-hydrostatic atmospheric model [J].
Abdi, Daniel S. ;
Wilcox, Lucas C. ;
Warburton, Timothy C. ;
Giraldo, Francis X. .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2019, 33 (01) :81-109
[5]  
Bernardet L., 2018, AM METEOROLOGICAL SO
[6]   HOMMEXX 1.0: a performance-portable atmospheric dynamical core for the Energy Exascale Earth System Model [J].
Bertagna, Luca ;
Deakin, Michael ;
Guba, Oksana ;
Sunderland, Daniel ;
Bradley, Andrew M. ;
Tezaur, Irina K. ;
Taylor, Mark A. ;
Salinger, Andrew G. .
GEOSCIENTIFIC MODEL DEVELOPMENT, 2019, 12 (04) :1423-1441
[7]   Data-Driven Super-Parameterization Using Deep Learning: Experimentation With Multiscale Lorenz 96 Systems and Transfer Learning [J].
Chattopadhyay, Ashesh ;
Subel, Adam ;
Hassanzadeh, Pedram .
JOURNAL OF ADVANCES IN MODELING EARTH SYSTEMS, 2020, 12 (11)
[8]  
Cumming B., 2013, P CRAY US GROUP 2013, P1
[9]  
Dahm J., 2022, EGUSPHERE, P1
[10]   Kokkos: Enabling performance portability across manycore architectures [J].
Edwards, H. Carter ;
Trott, Christian R. .
2013 EXTREME SCALING WORKSHOP (XSW 2013), 2014, :18-24