Mixed-precision block incomplete sparse approximate preconditioner on Tensor core

被引:3
作者
Zhang, Haoyuan [1 ,2 ]
Ma, Wenpeng [3 ]
Yuan, Wu [1 ,2 ]
Zhang, Jian [1 ,2 ]
Lu, Zhonghua [1 ,2 ]
机构
[1] Chinese Acad Sci, Comp Network Informat Ctr, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Xinyang Normal Univ, Xinyang, Peoples R China
关键词
Block-ISAI; GPU; Mixed-precision; Tensor core; Preconditioner;
D O I
10.1007/s42514-023-00165-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose and implement a mixed-precision Block-ISAI preconditioner for solving linear systems from multiphysics areas. By leveraging FP32 computing, our approach accelerates the sparse matrix-vector product kernel while maintaining satisfactory accuracy. Meanwhile, an efficient, warp-based GPU implementation for Block-ISAI preconditioner with Tensor core acceleration is proposed. For the matrix-multiplication portion of it, we use the double-precision Tensor core on the NVIDIA GPUs A100 to accelerate it. To showcase the effectiveness of our method, detailed comparisons are made which shows noteworthy speedup: precisely, it is 6x faster than cuSPARSE and 11.2x faster than PETSc's built-in preconditioner.
引用
收藏
页码:54 / 67
页数:14
相关论文
共 35 条
[1]   Similarity Search with Tensor Core Units [J].
Ahle, Thomas D. ;
Silvestri, Francesco .
SIMILARITY SEARCH AND APPLICATIONS, SISAP 2020, 2020, 12440 :76-84
[2]   Incomplete Sparse Approximate Inverses for Parallel Preconditioning [J].
Anzt, Hartwig ;
Huckle, Thomas K. ;
Braeckle, Juergen ;
Dongarra, Jack .
PARALLEL COMPUTING, 2018, 71 :1-22
[3]  
Anzt H, 2016, PROCEEDINGS OF SCALA 2016: 7TH WORKSHOP ON LATEST ADVANCES IN SCALABLE ALGORITHMS FOR LARGE-SCALE SYSTEMS, P49, DOI [10.1109/ScalA.2016.11, 10.1109/ScalA.2016.011]
[4]   Iterative Sparse Triangular Solves for Preconditioning [J].
Anzt, Hartwig ;
Chow, Edmond ;
Dongarra, Jack .
EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 :650-661
[5]  
Balay Satish, 2019, Petsc users manual
[6]  
Boyuan Feng, 2021, PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, P278, DOI 10.1145/3437801.3441599
[7]   A restricted additive Schwarz preconditioner for general sparse linear systems [J].
Cai, XC ;
Sarkis, M .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1999, 21 (02) :792-797
[8]   MIXED PRECISION ITERATIVE REFINEMENT WITH SPARSE APPROXIMATE INVERSE PRECONDITIONING [J].
Carson, Erin ;
Khan, Noaman .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2023, 45 (03) :C131-C153
[9]  
Choquette J., 2022, 2022 IEEE Hot Chips 34 Symposium (HCS), P1
[10]   Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs [J].
Chow, Edmond ;
Anzt, Hartwig ;
Dongarra, Jack .
HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2015, 2015, 9137 :1-16