A new diagonal storage for efficient implementation of sparse matrix-vector multiplication on graphics processing unit

被引:4
作者
He, Guixia [1 ]
Chen, Qi [2 ]
Gao, Jiaquan [2 ]
机构
[1] Zhejiang Univ Technol, Zhijiang Coll, Hangzhou, Peoples R China
[2] Nanjing Normal Univ, Sch Comp & Elect Informat, Jiangsu Key Lab NSLSCS, Nanjing 210023, Peoples R China
基金
中国国家自然科学基金;
关键词
CUDA; GPU; multidiagonal sparse matrices; sparse matrix– vector multiplication; sparse storage format; OPTIMIZATION;
D O I
10.1002/cpe.6230
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The sparse matrix-vector multiplication (SpMV) is of great importance in computational science. For multidiagonal sparse matrices that have many long zero sections or scatter points, a great number of zeros are filled to maintain the diagonal structure when using the popular DIA format to store them. This leads to the performance degradation of the DIA kernel. To alleviate the drawback of DIA, we present a novel diagonal storage format, called RBDCS (diagonal compressed storage based on row-blocks), for multidiagonal sparse matrices, and thus propose an efficient SpMV kernel that corresponds to RBDCS. Given that the RBDCS kernel codes must be manually rewritten for different multidiagonal sparse matrices, a code generator is presented to automatically generate RBDCS kernel codes. Experimental results show that the proposed RBDCS kernel is effective, and outperforms HYBMV in the CUSPARSE library, and three popular diagonal SpMV kernels: DIA, HDI, and CRSD.
引用
收藏
页数:15
相关论文
共 21 条
[1]   Incomplete Sparse Approximate Inverses for Parallel Preconditioning [J].
Anzt, Hartwig ;
Huckle, Thomas K. ;
Braeckle, Juergen ;
Dongarra, Jack .
PARALLEL COMPUTING, 2018, 71 :1-22
[2]  
Bell N, 2009, PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS
[3]   The University of Florida Sparse Matrix Collection [J].
Davis, Timothy A. ;
Hu, Yifan .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2011, 38 (01)
[4]  
Filippone S., 2015, 3 STORAGE FORMATS SP
[5]   Sparse Matrix-Vector Multiplication on GPGPUs [J].
Filippone, Salvatore ;
Cardellini, Valeria ;
Barbieri, Davide ;
Fanfarillo, Alessandro .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2017, 43 (04)
[6]   A thread-adaptive sparse approximate inverse preconditioning algorithm on multi-GPUs [J].
Gao, Jiaquan ;
Chen, Qi ;
He, Guixia .
PARALLEL COMPUTING, 2021, 101
[7]   GPU-accelerated preconditioned GMRES method for two-dimensional Maxwell's equations [J].
Gao, Jiaquan ;
Wu, Kesong ;
Wang, Yushun ;
Qi, Panpan ;
He, Guixia .
INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2017, 94 (10) :2122-2144
[8]   A multi-GPU parallel optimization model for the preconditioned conjugate gradient algorithm [J].
Gao, Jiaquan ;
Zhou, Yuanshen ;
He, Guixia ;
Xia, Yifei .
PARALLEL COMPUTING, 2017, 63 :1-16
[9]   A novel multi-graphics processing unit parallel optimization framework for the sparse matrix-vector multiplication [J].
Gao, Jiaquan ;
Wang, Yu ;
Wang, Jun .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (05)
[10]  
Gao JQ, 2017, INT J PARALLEL PROG, V45, P508, DOI 10.1007/s10766-016-0430-9