Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU

被引:0
作者
Zhao, Zhixiang [1 ]
Wu, Yanxia [1 ]
Zhang, Guoyin [1 ]
Yang, Yiqing [1 ]
Hong, Ruize [1 ]
机构
[1] Harbin Engn Univ, Dept Comp Sci, Harbin, Peoples R China
关键词
GPU; memory bandwidth; sparse matrices; SpMV; OPTIMIZATION; FORMAT; SPMV; SIMD;
D O I
10.1002/cpe.8366
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in scientific computing, machine learning, and data analysis. The performance of SpMV on GPUs is crucial for accelerating various applications. However, the efficiency of SpMV on GPUs is significantly affected by irregular memory access patterns, high memory bandwidth requirements, and insufficient exploitation of parallelism. In this paper, we propose a Recursive Hybrid Compression (RHC) method to address these challenges. RHC begins by splitting the initial matrix into two portions: an Ellpack (ELL) portion and a Coordinate (COO) portion. This partitioning is followed by further recursive division of the COO portion into additional ELL and COO portions, continuing this process until predefined termination criteria, based on a percentage threshold of the number of nonzero elements, are met. Additionally, we introduce a dynamic partitioning method to determine the optimal threshold for partitioning the matrix into ELL and COO portions based on the distribution of nonzero elements and the memory footprint. We develop the RHC algorithm to fully exploit the advantages of the ELL kernel on GPUs and achieve high thread-level parallelism. We evaluated our proposed method on two different NVIDIA GPUs: the GeForce RTX 2080 Ti and the A100, using a set of sparse matrices from the SuiteSparse Matrix Collection. We compare RHC with NVIDIA's cuSPARSE library and three state-of-the-art methods: SELLP, MergeBase, and BalanceCSR. RHC achieves average speedups of 2.13x$$ \times $$, 1.13x$$ \times $$, 1.87x$$ \times $$, and 1.27x$$ \times $$ over cuSPARSE, SELLP, MergeBase, and BalanceCSR, respectively.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Multi-GPU Implementation and Performance Optimization for CSR-Based Sparse Matrix-Vector Multiplication
    Guo, Ping
    Zhang, Changjiang
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2419 - 2423
  • [32] High-Performance Matrix-Vector Multiplication on the GPU
    Sorensen, Hans Henrik Brandenborg
    EURO-PAR 2011: PARALLEL PROCESSING WORKSHOPS, PT I, 2012, 7155 : 377 - 386
  • [33] Efficient Sparse Matrix-Vector Multiplication on Intel PIUMA Architecture
    Aananthakrishnan, Sriram
    Pawlowski, Robert
    Fryman, Joshua
    Hur, Ibrahim
    2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
  • [34] Improving the Performance of the Symmetric Sparse Matrix-Vector Multiplication in Multicore
    Gkountouvas, Theodoros
    Karakasis, Vasileios
    Kourtis, Kornilios
    Goumas, Georgios
    Koziris, Nectarios
    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 273 - 283
  • [35] Automatic tuning of sparse matrix-vector multiplication on multicore clusters
    LI ShiGang
    HU ChangJun
    ZHANG JunChao
    ZHANG YunQuan
    Science China(Information Sciences), 2015, 58 (09) : 17 - 30
  • [36] Iterative Sparse Matrix-Vector Multiplication for Integer Factorization on GPUs
    Schmidt, Bertil
    Aribowo, Hans
    Dang, Hoang-Vu
    EURO-PAR 2011 PARALLEL PROCESSING, PT 2, 2011, 6853 : 413 - 424
  • [37] Automatic tuning of sparse matrix-vector multiplication on multicore clusters
    Li ShiGang
    Hu ChangJun
    Zhang JunChao
    Zhang YunQuan
    SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (09) : 1 - 14
  • [38] Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Intel Xeon Phi
    Elafrou, Athena
    Goumas, Georgios
    Koziris, Nectarios
    2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 1389 - 1398
  • [39] HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors
    Li, Wenxuan
    Cheng, Helin
    Lu, Zhengyang
    Lu, Yuechen
    Liu, Weifeng
    2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, CLUSTER, 2023, : 209 - 220
  • [40] An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units
    Abu-Sufah, Walid
    Karim, Asma Abdel
    2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 453 - 460