Adaptive diagonal sparse matrix-vector multiplication on GPU

被引：6

作者：

Gao, Jiaquan ^{[1
]}

Xia, Yifei ^{[1
]}

Yin, Renjie ^{[1
]}

He, Guixia ^{[2
]}

机构：

[1] Nanjing Normal Univ, Sch Comp & Elect Informat, Jiangsu Key Lab NSLSCS, Nanjing 210023, Peoples R China

[2] Zhejiang Univ Technol, Zhijiang Coll, Hangzhou 310024, Peoples R China

来源：

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING | 2021年 / 157卷

基金：

中国国家自然科学基金;

关键词：

Diagonal sparse matrices; Sparse matrix-vector multiplication; Sparse storage format; CUDA; GPU; OPTIMIZATION; FORMAT; SPMV;

D O I：

10.1016/j.jpdc.2021.07.007

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

For diagonal sparse matrices that have many long zero sections or scatter points or diagonal deviations from the main diagonal, a great number of zeros need be filled to maintain the diagonal structure while using DIA to store them, which leads to the performance degradation of the existing DIA kernels because the padded zeros consume extra computation and memory resources. This motivates us to present an adaptive sparse matrix-vector multiplication (SpMV) for diagonal sparse matrices on the graphics processing unit (GPU), called DIA-Adaptive, to alleviate the drawback of DIA kernels for these cases. For DIA-Adaptive, there are the following characteristics: (1) two new sparse storage formats, BRCSD (Diagonal Compressed Storage based on Row-Blocks)-I and BRCSD-II, are proposed to adapt it to various types of diagonal sparse matrices besides adopting DIA, and SpMV kernels corresponding to these storage formats are presented; and (2) a search engine is designed to choose the most appropriate storage format from DIA, BRCSD-I, and BRCSD-II for any given diagonal sparse matrix; and (3) a code generator is presented to automatically generate SpMV kernels. Using DIA-Adaptive, the ideal storage format and kernel are automatically chosen for any given diagonal sparse matrix, and thus high performance is achieved. Experimental results show that our proposed DIA-Adaptive is effective, and has high performance and good parallelism, and outperforms the state-of-the-art SpMV algorithms for all test cases. (C) 2021 Elsevier Inc. All rights reserved.

引用

页码：287 / 302

页数：16

共 30 条

[1] [Anonymous], 2015, CUSP GENERIC PARALLE
[2] Barbieri D., 2015, 3 STORAGE FORMATS FO
[3] Bell N, 2009, STUDENTS GUIDE TO THE MA TESOL, P1
[4] Model-driven Autotuning of Sparse Matrix-Vector Multiply on GPUs
Choi, Jee W.
Singh, Amik
Vuduc, Richard W.
[J]. ACM SIGPLAN NOTICES, 2010, 45 (05) : 115 - 125
[5] Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices
Daga, Mayank
Greathouse, Joseph L.
[J]. 2015 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2015, : 64 - 74
[6] CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations
Dang, Hoang-Vu
Schmidt, Bertil
[J]. PARALLEL COMPUTING, 2013, 39 (11) : 737 - 750
[7] The University of Florida Sparse Matrix Collection
Davis, Timothy A.
Hu, Yifan
[J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2011, 38 (01):
[8] Sparse Matrix-Vector Multiplication on GPGPUs
Filippone, Salvatore
Cardellini, Valeria
Barbieri, Davide
Fanfarillo, Alessandro
[J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2017, 43 (04):
[9] GPU-accelerated preconditioned GMRES method for two-dimensional Maxwell's equations
Gao, Jiaquan
Wu, Kesong
Wang, Yushun
Qi, Panpan
He, Guixia
[J]. INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2017, 94 (10) : 2122 - 2144
[10] A multi-GPU parallel optimization model for the preconditioned conjugate gradient algorithm
Gao, Jiaquan
Zhou, Yuanshen
He, Guixia
Xia, Yifei
[J]. PARALLEL COMPUTING, 2017, 63 : 1 - 16

← 1 2 3 →