A DYNAMIC PATTERN FACTORED SPARSE APPROXIMATE INVERSE PRECONDITIONER ON GRAPHICS PROCESSING UNITS

被引:10
作者
Bernaschi, Massimo [1 ]
Carrozzo, Mauro [1 ]
Franceschini, Andrea [2 ]
Janna, Carlo [3 ]
机构
[1] CNR, Inst Appl Comp, I-00185 Rome, Italy
[2] Univ Padua, Dept ICEA, I-35131 Padua, Italy
[3] M3E Srl, I-35129 Padua, Italy
关键词
GPUs; approximate inverses; preconditioning; CONJUGATE-GRADIENT; LINEAR-SYSTEMS;
D O I
10.1137/18M1197461
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
One of the most time-consuming tasks in the procedures for the numerical study of PDEs is the solution to linear systems of equations. To that purpose, iterative solvers are viewed as a promising alternative to direct methods on high-performance computers since, in theory, they are almost perfectly parallelizable. Their main drawback is the need of finding a suitable preconditioner to accelerate convergence. The factorized sparse approximate inverse (FSAI), mainly in its adaptive form, has proven to be an effective parallel preconditioner for several problems. In the present work, we report about two novel ideas to dynamically compute, on graphics processing units (GPUs), the FSAI sparsity pattern, which is the main task in its setup. The first approach, borrowed from the CPU implementation, uses a global array as a nonzero indicator, whereas the second one relies on a merge-sort procedure of multiple arrays. We will show that the second approach requires significantly less memory and overcomes issues related to the limited global memory available on GPUs. Numerical tests prove that the GPU implementation of FSAI allows for an average speed-up of 7.5 over a parallel CPU implementation. Moreover, we will show that the preconditioner computation is still feasible using single precision arithmetic with a further 20% reduction of the setup cost. Finally, the strong scalability of the overall approach in shown in a multi-GPU setting.
引用
收藏
页码:C139 / C160
页数:22
相关论文
共 37 条
[1]  
[Anonymous], ZAP NAUCHN SEM S PET
[2]  
[Anonymous], 2015, NVIDIA CUDA C Programming Guide
[3]   Incomplete Sparse Approximate Inverses for Parallel Preconditioning [J].
Anzt, Hartwig ;
Huckle, Thomas K. ;
Braeckle, Juergen ;
Dongarra, Jack .
PARALLEL COMPUTING, 2018, 71 :1-22
[4]   Rigid body modes deflation of the Preconditioned Conjugate Gradient in the solution of discretized structural problems [J].
Baggio, Roberta ;
Franceschini, Andrea ;
Spiezia, Nicolo ;
Janna, Carlo .
COMPUTERS & STRUCTURES, 2017, 185 :15-26
[5]   EXPOSING FINE-GRAINED PARALLELISM IN ALGEBRAIC MULTIGRID METHODS [J].
Bell, Nathan ;
Dalton, Steven ;
Olson, Luke N. .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2012, 34 (04) :C123-C152
[6]   A comparative study of sparse approximate inverse preconditioners [J].
Benzi, M ;
Tuma, M .
APPLIED NUMERICAL MATHEMATICS, 1999, 30 (2-3) :305-340
[8]   A sparse approximate inverse preconditioner for the conjugate gradient method [J].
Benzi, M ;
Meyer, CD ;
Tuma, M .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1996, 17 (05) :1135-1149
[9]   A FACTORED SPARSE APPROXIMATE INVERSE PRECONDITIONED CONJUGATE GRADIENT SOLVER ON GRAPHICS PROCESSING UNITS [J].
Bernaschi, Massimo ;
Bisson, Mauro ;
Fantozzi, Carlo ;
Janna, Carlo .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2016, 38 (01) :C53-C72
[10]   Approximate Inverse Preconditioners for Krylov Methods on Heterogeneous Parallel Computers [J].
Bertaccini, Daniele ;
Filippone, Salvatore .
PARALLEL COMPUTING: ACCELERATING COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2014, 25 :183-192