Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU

被引:0
作者
Zhao, Zhixiang [1 ]
Wu, Yanxia [1 ]
Zhang, Guoyin [1 ]
Yang, Yiqing [1 ]
Hong, Ruize [1 ]
机构
[1] Harbin Engn Univ, Dept Comp Sci, Harbin, Peoples R China
关键词
GPU; memory bandwidth; sparse matrices; SpMV; OPTIMIZATION; FORMAT; SPMV; SIMD;
D O I
10.1002/cpe.8366
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in scientific computing, machine learning, and data analysis. The performance of SpMV on GPUs is crucial for accelerating various applications. However, the efficiency of SpMV on GPUs is significantly affected by irregular memory access patterns, high memory bandwidth requirements, and insufficient exploitation of parallelism. In this paper, we propose a Recursive Hybrid Compression (RHC) method to address these challenges. RHC begins by splitting the initial matrix into two portions: an Ellpack (ELL) portion and a Coordinate (COO) portion. This partitioning is followed by further recursive division of the COO portion into additional ELL and COO portions, continuing this process until predefined termination criteria, based on a percentage threshold of the number of nonzero elements, are met. Additionally, we introduce a dynamic partitioning method to determine the optimal threshold for partitioning the matrix into ELL and COO portions based on the distribution of nonzero elements and the memory footprint. We develop the RHC algorithm to fully exploit the advantages of the ELL kernel on GPUs and achieve high thread-level parallelism. We evaluated our proposed method on two different NVIDIA GPUs: the GeForce RTX 2080 Ti and the A100, using a set of sparse matrices from the SuiteSparse Matrix Collection. We compare RHC with NVIDIA's cuSPARSE library and three state-of-the-art methods: SELLP, MergeBase, and BalanceCSR. RHC achieves average speedups of 2.13x$$ \times $$, 1.13x$$ \times $$, 1.87x$$ \times $$, and 1.27x$$ \times $$ over cuSPARSE, SELLP, MergeBase, and BalanceCSR, respectively.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications
    Ashari, Arash
    Sedaghati, Naser
    Eisenlohr, John
    Parthasarathy, Srinivasan
    Sadayappan, P.
    SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 781 - 792
  • [22] Characterizing Dataset Dependence for Sparse Matrix-Vector Multiplication on GPUs
    Sedaghati, Naser
    Ashari, Arash
    Pouchet, Louis-Noel
    Parthasarathy, Srinivasan
    Sadayappan, P.
    2ND WORKSHOP ON PARALLEL PROGRAMMING FOR ANALYTICS APPLICATIONS (PPAA 2015), 2015, : 17 - 24
  • [23] Performance Evaluation of Sparse Matrix-Vector Multiplication Using GPU/MIC Cluster
    Maeda, Hiroshi
    Takahashi, Daisuke
    PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2015, : 396 - 399
  • [24] Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU
    Kubota, Yuji
    Takahashi, Daisuke
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2011, PT II, 2011, 6783 : 547 - 561
  • [25] Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUs
    Zhao, Zhixiang
    Zhang, Guoyin
    Wu, Yanxia
    Hong, Ruize
    Yang, Yiqing
    Fu, Yan
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (10) : 13681 - 13713
  • [26] Efficient dense matrix-vector multiplication on GPU
    He, Guixia
    Gao, Jiaquan
    Wang, Jun
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (19)
  • [27] Optimizing Sparse Matrix-Vector Multiplication on GPUs via Index Compression
    Sun, Xue
    Wei, Kai-Cheng
    Lai, Lien-Fu
    Tsai, Sung-Han
    Wu, Chao-Chin
    PROCEEDINGS OF 2018 IEEE 3RD ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC 2018), 2018, : 598 - 602
  • [28] A Family of Bit-Representation-Optimized Formats for Fast Sparse Matrix-Vector Multiplication on the GPU
    Tang, Wai Teng
    Tan, Wen Jun
    Goh, Rick Siow Mong
    Turner, Stephen John
    Wong, Weng-Fai
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (09) : 2373 - 2385
  • [29] Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors
    Zhang, Kai
    Chen, Shuming
    Wang, Yaohua
    Wan, Jianghua
    IEICE ELECTRONICS EXPRESS, 2013, 10 (09):
  • [30] SpDRAM: Efficient In-DRAM Acceleration of Sparse Matrix-Vector Multiplication
    Kang, Jieui
    Choi, Soeun
    Lee, Eunjin
    Sim, Jaehyeong
    IEEE ACCESS, 2024, 12 : 176009 - 176021