Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU

被引:0
作者
Zhao, Zhixiang [1 ]
Wu, Yanxia [1 ]
Zhang, Guoyin [1 ]
Yang, Yiqing [1 ]
Hong, Ruize [1 ]
机构
[1] Harbin Engn Univ, Dept Comp Sci, Harbin, Peoples R China
关键词
GPU; memory bandwidth; sparse matrices; SpMV; OPTIMIZATION; FORMAT; SPMV; SIMD;
D O I
10.1002/cpe.8366
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in scientific computing, machine learning, and data analysis. The performance of SpMV on GPUs is crucial for accelerating various applications. However, the efficiency of SpMV on GPUs is significantly affected by irregular memory access patterns, high memory bandwidth requirements, and insufficient exploitation of parallelism. In this paper, we propose a Recursive Hybrid Compression (RHC) method to address these challenges. RHC begins by splitting the initial matrix into two portions: an Ellpack (ELL) portion and a Coordinate (COO) portion. This partitioning is followed by further recursive division of the COO portion into additional ELL and COO portions, continuing this process until predefined termination criteria, based on a percentage threshold of the number of nonzero elements, are met. Additionally, we introduce a dynamic partitioning method to determine the optimal threshold for partitioning the matrix into ELL and COO portions based on the distribution of nonzero elements and the memory footprint. We develop the RHC algorithm to fully exploit the advantages of the ELL kernel on GPUs and achieve high thread-level parallelism. We evaluated our proposed method on two different NVIDIA GPUs: the GeForce RTX 2080 Ti and the A100, using a set of sparse matrices from the SuiteSparse Matrix Collection. We compare RHC with NVIDIA's cuSPARSE library and three state-of-the-art methods: SELLP, MergeBase, and BalanceCSR. RHC achieves average speedups of 2.13x$$ \times $$, 1.13x$$ \times $$, 1.87x$$ \times $$, and 1.27x$$ \times $$ over cuSPARSE, SELLP, MergeBase, and BalanceCSR, respectively.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Performance evaluation of sparse matrix-vector product (SpMV) computation on GPU architecture
    Kasmi, Najlae
    Mahmoudi, Sidi Ahmed
    Zbakh, Mostapha
    Manneback, Pierre
    2014 SECOND WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2014, : 23 - 27
  • [42] Sparse Matrix-Vector Multiplication Based on Online Arithmetic
    Cherati, Sahar Moradi
    Jaberipur, Ghassem
    Sousa, Leonel
    IEEE ACCESS, 2024, 12 : 87653 - 87664
  • [43] Processor-efficient sparse matrix-vector multiplication
    Heath, LS
    Ribbens, CJ
    Pemmaraju, SV
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2004, 48 (3-4) : 589 - 608
  • [44] Hierarchical Matrix Operations on GPUs: Matrix-Vector Multiplication and Compression
    Boukaram, Wajih
    Turkiyyah, George
    Keyes, David
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2019, 45 (01):
  • [45] SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision
    Hishinuma, Toshiaki
    Hasegawa, Hidehiko
    Tanaka, Teruo
    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 21 - 34
  • [46] Joint direct and transposed sparse matrix-vector multiplication for multithreaded CPUs
    Kozicky, Claudio
    Simecek, Ivan
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (13)
  • [47] Sparse Matrix-Vector Multiplication Cache Performance Evaluation and Design Exploration
    Cui, Jianfeng
    Lu, Kai
    Liu, Sheng
    29TH INTERNATIONAL SYMPOSIUM ON THE MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2021), 2021, : 97 - 103
  • [48] CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication
    Liu, Weifeng
    Vinter, Brian
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 339 - 350
  • [49] A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication
    Fowers, Jeremy
    Ovtcharov, Kalin
    Strauss, Karin
    Chung, Eric S.
    Stitt, Greg
    2014 IEEE 22ND ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2014), 2014, : 36 - 43
  • [50] Bitmap-Based Sparse Matrix-Vector Multiplication with Tensor Cores
    Chen, YuAng
    Yu, Jeffery Xu
    53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024, 2024, : 1135 - 1144