Mixed-precision block incomplete sparse approximate preconditioner on Tensor core

被引：3

作者：

Zhang, Haoyuan ^{[1
,2
]}

Ma, Wenpeng ^{[3
]}

Yuan, Wu ^{[1
,2
]}

Zhang, Jian ^{[1
,2
]}

Lu, Zhonghua ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Comp Network Informat Ctr, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

[3] Xinyang Normal Univ, Xinyang, Peoples R China

来源：

CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING | 2024年 / 6卷 / 01期

关键词：

Block-ISAI; GPU; Mixed-precision; Tensor core; Preconditioner;

D O I：

10.1007/s42514-023-00165-9

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose and implement a mixed-precision Block-ISAI preconditioner for solving linear systems from multiphysics areas. By leveraging FP32 computing, our approach accelerates the sparse matrix-vector product kernel while maintaining satisfactory accuracy. Meanwhile, an efficient, warp-based GPU implementation for Block-ISAI preconditioner with Tensor core acceleration is proposed. For the matrix-multiplication portion of it, we use the double-precision Tensor core on the NVIDIA GPUs A100 to accelerate it. To showcase the effectiveness of our method, detailed comparisons are made which shows noteworthy speedup: precisely, it is 6x faster than cuSPARSE and 11.2x faster than PETSc's built-in preconditioner.

引用

页码：54 / 67

页数：14

共 35 条

[1] Similarity Search with Tensor Core Units [J].

Ahle, Thomas D. ;

Silvestri, Francesco .

SIMILARITY SEARCH AND APPLICATIONS, SISAP 2020, 2020, 12440 :76-84

[2] Incomplete Sparse Approximate Inverses for Parallel Preconditioning [J].

Anzt, Hartwig ;

Huckle, Thomas K. ;

Braeckle, Juergen ;

Dongarra, Jack .

PARALLEL COMPUTING, 2018, 71 :1-22

[3]

Anzt H, 2016, PROCEEDINGS OF SCALA 2016: 7TH WORKSHOP ON LATEST ADVANCES IN SCALABLE ALGORITHMS FOR LARGE-SCALE SYSTEMS, P49, DOI [10.1109/ScalA.2016.11, 10.1109/ScalA.2016.011]

[4] Iterative Sparse Triangular Solves for Preconditioning [J].

Anzt, Hartwig ;

Chow, Edmond ;

Dongarra, Jack .

EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 :650-661

[5]

Balay Satish, 2019, Petsc users manual

[6]

Boyuan Feng, 2021, PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, P278, DOI 10.1145/3437801.3441599

[7] A restricted additive Schwarz preconditioner for general sparse linear systems [J].

Cai, XC ;

Sarkis, M .

SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1999, 21 (02) :792-797

[8] MIXED PRECISION ITERATIVE REFINEMENT WITH SPARSE APPROXIMATE INVERSE PRECONDITIONING [J].

Carson, Erin ;

Khan, Noaman .

SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2023, 45 (03) :C131-C153

[9]

Choquette J., 2022, 2022 IEEE Hot Chips 34 Symposium (HCS), P1

[10] Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs [J].

Chow, Edmond ;

Anzt, Hartwig ;

Dongarra, Jack .

HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2015, 2015, 9137 :1-16

← 1 2 3 4 →