Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU

被引：0

作者：

Zhao, Zhixiang ^{[1
]}

Wu, Yanxia ^{[1
]}

Zhang, Guoyin ^{[1
]}

Yang, Yiqing ^{[1
]}

Hong, Ruize ^{[1
]}

机构：

[1] Harbin Engn Univ, Dept Comp Sci, Harbin, Peoples R China

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2025年 / 37卷 / 4-5期

关键词：

GPU; memory bandwidth; sparse matrices; SpMV; OPTIMIZATION; FORMAT; SPMV; SIMD;

D O I：

10.1002/cpe.8366

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in scientific computing, machine learning, and data analysis. The performance of SpMV on GPUs is crucial for accelerating various applications. However, the efficiency of SpMV on GPUs is significantly affected by irregular memory access patterns, high memory bandwidth requirements, and insufficient exploitation of parallelism. In this paper, we propose a Recursive Hybrid Compression (RHC) method to address these challenges. RHC begins by splitting the initial matrix into two portions: an Ellpack (ELL) portion and a Coordinate (COO) portion. This partitioning is followed by further recursive division of the COO portion into additional ELL and COO portions, continuing this process until predefined termination criteria, based on a percentage threshold of the number of nonzero elements, are met. Additionally, we introduce a dynamic partitioning method to determine the optimal threshold for partitioning the matrix into ELL and COO portions based on the distribution of nonzero elements and the memory footprint. We develop the RHC algorithm to fully exploit the advantages of the ELL kernel on GPUs and achieve high thread-level parallelism. We evaluated our proposed method on two different NVIDIA GPUs: the GeForce RTX 2080 Ti and the A100, using a set of sparse matrices from the SuiteSparse Matrix Collection. We compare RHC with NVIDIA's cuSPARSE library and three state-of-the-art methods: SELLP, MergeBase, and BalanceCSR. RHC achieves average speedups of 2.13x$$ \times $$, 1.13x$$ \times $$, 1.87x$$ \times $$, and 1.27x$$ \times $$ over cuSPARSE, SELLP, MergeBase, and BalanceCSR, respectively.

引用

页数：13

共 50 条

[41] Performance evaluation of sparse matrix-vector product (SpMV) computation on GPU architecture
Kasmi, Najlae
Mahmoudi, Sidi Ahmed
Zbakh, Mostapha
Manneback, Pierre
2014 SECOND WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2014, : 23 - 27
[42] Sparse Matrix-Vector Multiplication Based on Online Arithmetic
Cherati, Sahar Moradi
Jaberipur, Ghassem
Sousa, Leonel
IEEE ACCESS, 2024, 12 : 87653 - 87664
[43] Processor-efficient sparse matrix-vector multiplication
Heath, LS
Ribbens, CJ
Pemmaraju, SV
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2004, 48 (3-4) : 589 - 608
[44] Hierarchical Matrix Operations on GPUs: Matrix-Vector Multiplication and Compression
Boukaram, Wajih
Turkiyyah, George
Keyes, David
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2019, 45 (01):
[45] SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision
Hishinuma, Toshiaki
Hasegawa, Hidehiko
Tanaka, Teruo
HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 21 - 34
[46] Joint direct and transposed sparse matrix-vector multiplication for multithreaded CPUs
Kozicky, Claudio
Simecek, Ivan
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (13)
[47] Sparse Matrix-Vector Multiplication Cache Performance Evaluation and Design Exploration
Cui, Jianfeng
Lu, Kai
Liu, Sheng
29TH INTERNATIONAL SYMPOSIUM ON THE MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2021), 2021, : 97 - 103
[48] CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication
Liu, Weifeng
Vinter, Brian
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 339 - 350
[49] A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication
Fowers, Jeremy
Ovtcharov, Kalin
Strauss, Karin
Chung, Eric S.
Stitt, Greg
2014 IEEE 22ND ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2014), 2014, : 36 - 43
[50] Bitmap-Based Sparse Matrix-Vector Multiplication with Tensor Cores
Chen, YuAng
Yu, Jeffery Xu
53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024, 2024, : 1135 - 1144

← 1 2 3 4 5 →