Adaptation of fluid model EULAG to graphics processing unit architecture

被引:17
作者
Rojek, Krzysztof Andrzej [1 ]
Ciznicki, Milosz [2 ]
Rosa, Bogdan [3 ]
Kopta, Piotr [2 ]
Kulczewski, Michal [2 ]
Kurowski, Krzysztof [2 ]
Piotrowski, Zbigniew Pawel [3 ]
Szustak, Lukasz [1 ]
Wojcik, Damian Karol [3 ]
Wyrzykowski, Roman [1 ]
机构
[1] Czestochowa Tech Univ, PL-42201 Czestochowa, Poland
[2] Poznan Supercomp & Networking Ctr Applicat, Poznan Wielkopolskia, Poland
[3] Natl Res Inst, Inst Meteorol & Water Management, Warsaw, Poland
关键词
parallel programming; GPGPU; CUDA; EULAG; MPDATA; stencils; elliptic solver; STENCIL COMPUTATIONS; SOUNDPROOF; PARALLELIZATION; SIMULATION; ALGORITHM;
D O I
10.1002/cpe.3417
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The goal of this study is to adapt the multiscale fluid solver EULerian or LAGrangian framewrok (EULAG) to future graphics processing units (GPU) platforms. The EULAG model has the proven record of successful applications, and excellent efficiency and scalability on conventional supercomputer architectures. Currently, the model is being implemented as the new dynamical core of the COSMO weather prediction framework. Within this study, two main modules of EULAG, namely the multidimensional positive definite advection transport algorithm (MPDATA) and the variational generalized conjugate residual, elliptic pressure solver Generalized Conjugate Residual (GCR) are analyzed and optimized. In this paper, a method is proposed, which ensures a comprehensive analysis of the resource consumption including registers, shared, and global memories. This method allows us to identify bottlenecks of the algorithm, including data transfers between host and global memory, global and shared memories, as well as GPU occupancy. We put the emphasis on providing a fixed memory access pattern, padding as well as organizing computation in the MPDATA algorithm. The testing and validation of the new GPU implementation have been carried out based on modeling decaying turbulence of a homogeneous incompressible fluid in a triply-periodic cube. Simulations performed using the standard version of EULAG and its new GPU implementation give similar solutions. Preliminary results show a promising increase in terms of computational efficiency. Copyright (c) 2014 John Wiley & Sons, Ltd.
引用
收藏
页码:937 / 957
页数:21
相关论文
共 33 条
  • [1] [Anonymous], 2010, Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, DOI [10.1109/SC.2010.2, DOI 10.1109/SC.2010.2]
  • [2] Ciznicki M., 2014, PARALLEL PROCESSING, V8384, P155
  • [3] Datta K., 2008, P 2008 ACM IEEE C SU, P1
  • [4] Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
    Datta, Kaushik
    Kamil, Shoaib
    Williams, Samuel
    Oliker, Leonid
    Shalf, John
    Yelick, Katherine
    [J]. SIAM REVIEW, 2009, 51 (01) : 129 - 159
  • [5] De la Cruz R, 2010, INTRO SEMISTENCIL AL
  • [6] Hager A, 2011, INTRO HIGH PERFORMAN
  • [7] Kamil S., 2005, MSP 05, P36
  • [8] Efficient 3D stencil computations using CUDA
    Krotkiewski, Marcin
    Dabrowski, Marcin
    [J]. PARALLEL COMPUTING, 2013, 39 (10) : 533 - 548
  • [9] Kurowski K, 2011, ENVIRON SCI ENG, P735, DOI 10.1007/978-3-642-19536-5_57
  • [10] Cecilia JM, 2012, LECT NOTES COMPUT SC, V7133, P173