Finite Element Algorithms and Data Structures on Graphical Processing Units

被引:0
作者
I. Z. Reguly
M. B. Giles
机构
[1] Pázmány Péter Catholic University,
[2] Oxford e-Research Centre,undefined
来源
International Journal of Parallel Programming | 2015年 / 43卷
关键词
Graphical processing unit; Finite element method; Performance analysis; Sparse matrix-vector multiplication; Preconditioned conjugate gradient method;
D O I
暂无
中图分类号
学科分类号
摘要
The finite element method (FEM) is one of the most commonly used techniques for the solution of partial differential equations on unstructured meshes. This paper discusses both the assembly and the solution phases of the FEM with special attention to the balance of computation and data movement. We present a GPU assembly algorithm that scales to arbitrary degree polynomials used as basis functions, at the expense of redundant computations. We show how the storage of the stiffness matrix affects the performance of both the assembly and the solution. We investigate two approaches: global assembly into the CSR and ELLPACK matrix formats and matrix-free algorithms, and show the trade-off between the amount of indexing data and stiffness data. We discuss the performance of different approaches in light of the implicit caches on Fermi GPUs and show a speedup over a two-socket 12-core CPU of up to 10 times in the assembly and up to 6 times in the solution phase. We present our sparse matrix-vector multiplication algorithms that are part of a conjugate gradient iteration and show that a matrix-free approach may be up to two times faster than global assembly approaches and up to 4 times faster than NVIDIA’s cuSPARSE library, depending on the preconditioner used.
引用
收藏
页码:203 / 239
页数:36
相关论文
共 21 条
  • [1] Alefeld G(1982)On the convergence of the symmetric sor method for matrices with red-black ordering Numerische Mathematik 39 113-117
  • [2] Bolz J(2003)Sparse matrix solvers on the GPU: Conjugate gradients and multigrid ACM Transactions on Graphics 22 917-924
  • [3] Farmer I(2011)Assembly of finite element methods on graphics processors International Journal for Numerical Methods in Engineering 85 640-669
  • [4] Grinspun E(2005)p-multigrid solution of high-order discontinuous galerkin discretizations of the compressible navier-stokes equations J. Comput. Phys. 207 92-113
  • [5] Schröder P(2010)Modeling the propagation of elastic waves using spectral elements on a cluster of 192 GPUs Computer Science Research and Development 25 75-82
  • [6] Cecka C(2009)Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA Journal of Parallel and Distributed Computing 69 451-460
  • [7] Lew AJ(1987)Multicolor ICCG Methods for Vector Computers SIAM Journal on Numerical Analysis 24 1394-1418
  • [8] Darve E(undefined)undefined undefined undefined undefined-undefined
  • [9] Fidkowski KJ(undefined)undefined undefined undefined undefined-undefined
  • [10] Oliver TA(undefined)undefined undefined undefined undefined-undefined