GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices

被引:1
作者
Utpal Kiran
Sachin Singh Gautam
Deepak Sharma
机构
[1] Indian Institute of Technology,Department of Mechanical Engineering
来源
Computing | 2020年 / 102卷
关键词
Matrix-free solver; Finite element method; GPU; CUDA; Parallel computing; 74S05; 65Y05;
D O I
暂无
中图分类号
学科分类号
摘要
Matrix-free solvers for finite element method (FEM) avoid assembly of elemental matrices and replace sparse matrix-vector multiplication required in iterative solution method by an element level dense matrix-vector product. In this paper, a novel matrix-free strategy for FEM is proposed which computes element level matrix-vector product by using only the symmetric part of the elemental matrices. The proposed strategy is developed to take advantage of the massive parallelism of Graphics Processing Unit (GPU). A unique data structure is also introduced which ensures localized and coalesced memory access suitable for a GPU while storing only the symmetric part of the elemental matrices. In addition, the proposed strategy emphasizes the efficient use of register cache, uniform workload distribution, reducing thread synchronization, and maintaining sufficient granularity to make the best use of GPU resources. The performance of the proposed strategy is evaluated by solving elasticity and heat conduction problems using 4-noded quadrilateral element with two degrees of freedom (DOFs) and one DOF per node, respectively. The performance is compared with the matrix-free solver strategies on GPU from the literature. It is found that a maximum speedup of 4.9 ×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} is obtained for the elasticity problem and a maximum of 3.2 ×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} speedup for the heat conduction problem. Further, the proposed strategy takes the least amount of GPU memory as compared to the existing strategies.
引用
收藏
页码:1941 / 1965
页数:24
相关论文
共 103 条
[1]  
Ahamed AKC(2017)Conjugate gradient method with graphics processing unit acceleration: CUDA vs OpenCL Adv Eng Softw 111 32-42
[2]  
Magoulès F(2016)Large scale three-dimensional topology optimisation of heat sinks cooled by natural convection Int J Heat Mass Transf 100 876-891
[3]  
Alexandersen J(2017)An efficient sparse matrix-vector multiplication on CUDA-enabled graphic processing units for finite element method simulations Int J Numer Methods Eng 110 57-78
[4]  
Sigmund O(2017)Preconditioned Krylov solvers on GPUs Parallel Comput 68 32-44
[5]  
Aage N(2018)A stencil scaling approach for accelerating matrix-free finite element implementations SIAM J Sci Comput 40 C748-C778
[6]  
Altinkaynak A(2013)A parallel node-based solution scheme for implicit finite element method using GPU Proc Eng 61 318-324
[7]  
Anzt H(1986)Element-by-element linear and nonlinear solution schemes Int J Numer Methods Biomed Eng 2 145-153
[8]  
Gates M(2011)Assembly of finite element methods on graphics processors Int J Numer Methods Eng 85 640-669
[9]  
Dongarra J(2019)Batched triangular dense linear algebra kernels for very small matrix sizes on GPUs ACM Trans Math Softw TOMS 45 15:1-15:28
[10]  
Kreutzer M(2019)A matrix-free high-order discontinuous Galerkin compressible Navier–Stokes solver: a performance comparison of compressible and incompressible formulations for turbulent incompressible flows Int J Numer Methods Fluids 89 71-102