Targeting performance and user-friendliness: GPU-accelerated finite element computation with automated code generation in FEniCS

被引：2

作者：

Trotter, James D. ^{[1
]}

Langguth, Johannes ^{[2
]}

Cai, Xing ^{[3
]}

机构：

[1] Simula Res Lab, Kristian Augusts gate 23, N-0164 Oslo, Norway

[2] Univ Bergen, Dept Informat, POB 7803, N-5020 Bergen, Norway

[3] Univ Oslo, Dept Informat, POB 1080 Blindern, N-0316 Oslo, Norway

来源：

PARALLEL COMPUTING | 2023年 / 118卷

基金：

欧盟地平线“2020”;

关键词：

Finite element method; Automated code generation; GPU computing; CUDA; Unstructured mesh; SOLVERS;

D O I：

10.1016/j.parco.2023.103051

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper studies the use of automated code generation to provide user-friendly GPU acceleration for solving partial differential equations (PDEs) with finite element methods. By extending the FEniCS framework and its automated compiler, we have achieved that a high-level description of finite element computations written in the Unified Form Language is auto-translated to parallelised CUDA C++ code. The auto-generated code provides GPU offloading for the finite element assembly of linear equation systems which are then solved by a GPU-supported linear algebra backend. Specifically, we explore several auto-generated optimisations of the resulting CUDA C++ code. Numerical experiments show that GPU-based linear system assembly for a typical PDE with first-order elements can benefit from using a lookup table to avoid repeatedly carrying out numerous binary searches, and that further performance gains can be obtained by assembling a sparse matrix row by row. More importantly, the extended FEniCS compiler is able to seamlessly couple the assembly and solution phases for GPU acceleration, so that all unnecessary CPU-GPU data transfers are eliminated. Detailed experiments are used to quantify the negative impact of these data transfers, which can entirely destroy the potential of GPU acceleration if the assembly and solution phases are offloaded to GPU separately. Finally, a complete, auto-generated GPU-based PDE solver for a nonlinear solid mechanics application is used to demonstrate a substantial speedup over running on dual-socket multi-core CPUs, including GPU acceleration of algebraic multigrid as the preconditioner.

引用

页数：12

共 58 条

[1] GPU algorithms for Efficient Exascale Discretizations [J].

Abdelfattah, Ahmad ;

Barra, Valeria ;

Beams, Natalie ;

Bleile, Ryan ;

Brown, Jed ;

Camier, Jean-Sylvain ;

Carson, Robert ;

Chalmers, Noel ;

Dobrev, Veselin ;

Dudouit, Yohann ;

Fischer, Paul ;

Karakus, Ali ;

Kerkemeier, Stefan ;

Kolev, Tzanio ;

Lan, Yu-Hsiang ;

Merzari, Elia ;

Min, Misun ;

Phillips, Malachi ;

Rathnayake, Thilina ;

Rieben, Robert ;

Stitt, Thomas ;

Tomboulides, Ananias ;

Tomov, Stanimire ;

Tomov, Vladimir ;

Vargas, Arturo ;

Warburton, Tim ;

Weiss, Kenneth .

PARALLEL COMPUTING, 2021, 108

[2] An efficient GPU version of the preconditioned GMRES method [J].

Aliaga, Jose I. ;

Dufrechou, Ernesto ;

Ezzatti, Pablo ;

Quintana-Orti, Enrique S. .

JOURNAL OF SUPERCOMPUTING, 2019, 75 (03) :1455-1469

[3] Unified Form Language: A Domain-Specific Language for Weak Formulations of Partial Differential Equations [J].

Alnaes, Martin S. ;

Logg, Anders ;

Olgaard, Kristian B. ;

Rognes, Marie E. ;

Wells, Garth N. .

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2014, 40 (02)

[4] On the Efficiency of Symbolic Computations Combined with Code Generation for Finite Element Methods [J].

Alnaes, Martin Sandve ;

Mardal, Kent-Andre .

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2010, 37 (01)

[5] MFEM: A modular finite element methods library [J].

Anderson, Robert ;

Andrej, Julian ;

Barker, Andrew ;

Bramwell, Jamie ;

Camier, Jean-Sylvain ;

Cerveny, Jakub ;

Dobrev, Veselin ;

Dudouit, Yohann ;

Fisher, Aaron ;

Kolev, Tzanio ;

Pazner, Will ;

Stowell, Mark ;

Tomov, Vladimir ;

Akkerman, Ido ;

Dahm, Johann ;

Medina, David ;

Zampini, Stefano .

COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2021, 81 :42-74

[6]

[Anonymous], 2022, CUDA C++ programming guide

[7] Preparing sparse solvers for exascale computing [J].

Anzt, Hartwig ;

Boman, Erik ;

Falgout, Rob ;

Ghysels, Pieter ;

Heroux, Michael ;

Li, Xiaoye ;

McInnes, Lois Curfman ;

Mills, Richard Tran ;

Rajamanickam, Sivasankaran ;

Rupp, Karl ;

Smith, Barry ;

Yamazaki, Ichitaro ;

Yang, Ulrike Meier .

PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2020, 378 (2166)

[8] Preconditioned Krylov solvers on GPUs [J].

Anzt, Hartwig ;

Gates, Mark ;

Dongarra, Jack ;

Kreutzer, Moritz ;

Wellein, Gerhard ;

Koehler, Martin .

PARALLEL COMPUTING, 2017, 68 :32-44

[9] The DEAL.II finite element library: Design, features, and insights [J].

Arndt, Daniel ;

Bangerth, Wolfgang ;

Davydov, Denis ;

Heister, Timo ;

Heltai, Luca ;

Kronbichler, Martin ;

Maier, Matthias ;

Pelteret, Jean-Paul ;

Turcksin, Bruno ;

Wells, David .

COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2021, 81 :407-422

[10]

Arnold D.N., 2014, SIAM News, V47

← 1 2 3 4 5 6 →