Adaptive diagonal sparse matrix-vector multiplication on GPU

被引:6
作者
Gao, Jiaquan [1 ]
Xia, Yifei [1 ]
Yin, Renjie [1 ]
He, Guixia [2 ]
机构
[1] Nanjing Normal Univ, Sch Comp & Elect Informat, Jiangsu Key Lab NSLSCS, Nanjing 210023, Peoples R China
[2] Zhejiang Univ Technol, Zhijiang Coll, Hangzhou 310024, Peoples R China
基金
中国国家自然科学基金;
关键词
Diagonal sparse matrices; Sparse matrix-vector multiplication; Sparse storage format; CUDA; GPU; OPTIMIZATION; FORMAT; SPMV;
D O I
10.1016/j.jpdc.2021.07.007
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
For diagonal sparse matrices that have many long zero sections or scatter points or diagonal deviations from the main diagonal, a great number of zeros need be filled to maintain the diagonal structure while using DIA to store them, which leads to the performance degradation of the existing DIA kernels because the padded zeros consume extra computation and memory resources. This motivates us to present an adaptive sparse matrix-vector multiplication (SpMV) for diagonal sparse matrices on the graphics processing unit (GPU), called DIA-Adaptive, to alleviate the drawback of DIA kernels for these cases. For DIA-Adaptive, there are the following characteristics: (1) two new sparse storage formats, BRCSD (Diagonal Compressed Storage based on Row-Blocks)-I and BRCSD-II, are proposed to adapt it to various types of diagonal sparse matrices besides adopting DIA, and SpMV kernels corresponding to these storage formats are presented; and (2) a search engine is designed to choose the most appropriate storage format from DIA, BRCSD-I, and BRCSD-II for any given diagonal sparse matrix; and (3) a code generator is presented to automatically generate SpMV kernels. Using DIA-Adaptive, the ideal storage format and kernel are automatically chosen for any given diagonal sparse matrix, and thus high performance is achieved. Experimental results show that our proposed DIA-Adaptive is effective, and has high performance and good parallelism, and outperforms the state-of-the-art SpMV algorithms for all test cases. (C) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:287 / 302
页数:16
相关论文
共 30 条
  • [1] [Anonymous], 2015, CUSP GENERIC PARALLE
  • [2] Barbieri D., 2015, 3 STORAGE FORMATS FO
  • [3] Bell N, 2009, STUDENTS GUIDE TO THE MA TESOL, P1
  • [4] Model-driven Autotuning of Sparse Matrix-Vector Multiply on GPUs
    Choi, Jee W.
    Singh, Amik
    Vuduc, Richard W.
    [J]. ACM SIGPLAN NOTICES, 2010, 45 (05) : 115 - 125
  • [5] Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices
    Daga, Mayank
    Greathouse, Joseph L.
    [J]. 2015 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2015, : 64 - 74
  • [6] CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations
    Dang, Hoang-Vu
    Schmidt, Bertil
    [J]. PARALLEL COMPUTING, 2013, 39 (11) : 737 - 750
  • [7] The University of Florida Sparse Matrix Collection
    Davis, Timothy A.
    Hu, Yifan
    [J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2011, 38 (01):
  • [8] Sparse Matrix-Vector Multiplication on GPGPUs
    Filippone, Salvatore
    Cardellini, Valeria
    Barbieri, Davide
    Fanfarillo, Alessandro
    [J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2017, 43 (04):
  • [9] GPU-accelerated preconditioned GMRES method for two-dimensional Maxwell's equations
    Gao, Jiaquan
    Wu, Kesong
    Wang, Yushun
    Qi, Panpan
    He, Guixia
    [J]. INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2017, 94 (10) : 2122 - 2144
  • [10] A multi-GPU parallel optimization model for the preconditioned conjugate gradient algorithm
    Gao, Jiaquan
    Zhou, Yuanshen
    He, Guixia
    Xia, Yifei
    [J]. PARALLEL COMPUTING, 2017, 63 : 1 - 16