LICOM3-CUDA: a GPU version of LASG/IAP climate system ocean model version 3 based on CUDA

被引:0
作者
Junlin Wei
Jinrong Jiang
Hailong Liu
Feng Zhang
Pengfei Lin
Pengfei Wang
Yongqiang Yu
Xuebin Chi
Lian Zhao
Mengrong Ding
Yiwen Li
Zipeng Yu
Weipeng Zheng
Yuzhu Wang
机构
[1] Chinese Academy of Sciences,Computer Network Information Center
[2] Chinese Academy of Sciences (CAS),State Key Laboratory of Numerical Modeling for Atmospheric Sciences and Geophysical Fluid Dynamics (LASG), Institute of Atmospheric Physics (IAP)
[3] University of Chinese Academy of Sciences,Center for Monsoon System Research (CMSR), Institute of Atmospheric Physics (IAP)
[4] Chinese Academy of Sciences (CAS),School of Information Engineering
[5] China University of Geosciences,undefined
来源
The Journal of Supercomputing | 2023年 / 79卷
关键词
High performance computing; Ocean general circulation model; Graphics processing unit; Compute unified device architecture;
D O I
暂无
中图分类号
学科分类号
摘要
The ocean general circulation model (OGCM) is an essential tool for researching oceanography and atmospheric science. The LASG/IAP climate system ocean model version 3 (LICOM3) is a parallel version of the OGCM. Our goal is to implement and optimize a GPU version of LICOM3 based on compute unified device architecture (CUDA) called LICOM3-CUDA. Considering the characteristics of LICOM3 and CUDA, we design and implement some pivotal optimization methods, including redesigning the numerical algorithms of complicated functions, decoupling data dependency, avoiding memory write conflicts, and optimizing communication. In this paper, we selected two experiments, including 1∘\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\circ }$$\end{document} (small-scale) and 0.1∘\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\circ }$$\end{document} (large-scale) resolutions to evaluate the performance of LICOM3-CUDA. Under the experimental environment of two Intel Xeon Gold 6148 CPUs and four NVIDIA Quadro GV100s, the LICOM3-CUDA (1∘\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\circ }$$\end{document}) achieves a simulation speed of 114.3 simulation-year-per-day (SYPD). Compare with the performance of LICOM3, the LICOM3-CUDA can run much faster with 6.5 times, and the compute-intensive module achieves over 70×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} speedup. In addition, the energy consumption for the simulation year is reduced by 41.3%. As for high-resolution and large-scale simulation, the number of GPUs increased from 96 to 1536 as well as the LICOM3-CUDA (0.1∘\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\circ }$$\end{document}) time consumption decreased from 3261 to 720 seconds with approximately 4.5×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} of speedup.
引用
收藏
页码:9604 / 9634
页数:30
相关论文
共 155 条
  • [1] Lazo JK(2011)U.S. economic sensitivity to weather variability Bull Am Meteorol Soc 92 709-720
  • [2] Lawson M(2020)Kilometer-scale climate models: prospects and challenges Bull Am Meteorol Soc 101 567-587
  • [3] Larsen PH(2018)Science and research policy at the end of moore’s law (vol 1, pg 14, 2018) Nat Electron 1 146-146
  • [4] Waldman DM(2001)Device scaling limits of si mosfets and their application dependencies Proc IEEE 89 259-288
  • [5] Schär C(2021)The digital revolution of earth-system science Nat Comput Sci 1 104-113
  • [6] Fuhrer O(2008)Gpu acceleration of numerical weather prediction Parallel Process Lett 18 531-548
  • [7] Arteaga A(2017)A scalable parallel algorithm for atmospheric general circulation models on a multi-core cluster Futur Gener Comput Syst 72 1-10
  • [8] Ban N(2022)Optimization of cosmological n-body simulation with fmm-pm on simt accelerators J Supercomput 78 7186-7205
  • [9] Charpilloz C(2021)Ignite-gpu: a gpu-enabled in-memory computing architecture on clusters J Supercomput 77 3165-3192
  • [10] Girolamo SD(2017)Clus\_gpu-blastp: accelerated protein sequence alignment using gpu-enabled cluster J Supercomput 73 4580-4595