Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU

被引：0

作者：

Zhao, Zhixiang ^{[1
]}

Wu, Yanxia ^{[1
]}

Zhang, Guoyin ^{[1
]}

Yang, Yiqing ^{[1
]}

Hong, Ruize ^{[1
]}

机构：

[1] Harbin Engn Univ, Dept Comp Sci, Harbin, Peoples R China

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2025年 / 37卷 / 4-5期

关键词：

GPU; memory bandwidth; sparse matrices; SpMV; OPTIMIZATION; FORMAT; SPMV; SIMD;

D O I：

10.1002/cpe.8366

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in scientific computing, machine learning, and data analysis. The performance of SpMV on GPUs is crucial for accelerating various applications. However, the efficiency of SpMV on GPUs is significantly affected by irregular memory access patterns, high memory bandwidth requirements, and insufficient exploitation of parallelism. In this paper, we propose a Recursive Hybrid Compression (RHC) method to address these challenges. RHC begins by splitting the initial matrix into two portions: an Ellpack (ELL) portion and a Coordinate (COO) portion. This partitioning is followed by further recursive division of the COO portion into additional ELL and COO portions, continuing this process until predefined termination criteria, based on a percentage threshold of the number of nonzero elements, are met. Additionally, we introduce a dynamic partitioning method to determine the optimal threshold for partitioning the matrix into ELL and COO portions based on the distribution of nonzero elements and the memory footprint. We develop the RHC algorithm to fully exploit the advantages of the ELL kernel on GPUs and achieve high thread-level parallelism. We evaluated our proposed method on two different NVIDIA GPUs: the GeForce RTX 2080 Ti and the A100, using a set of sparse matrices from the SuiteSparse Matrix Collection. We compare RHC with NVIDIA's cuSPARSE library and three state-of-the-art methods: SELLP, MergeBase, and BalanceCSR. RHC achieves average speedups of 2.13x$$ \times $$, 1.13x$$ \times $$, 1.87x$$ \times $$, and 1.27x$$ \times $$ over cuSPARSE, SELLP, MergeBase, and BalanceCSR, respectively.

引用

页数：13

共 50 条

[21] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications
Ashari, Arash
Sedaghati, Naser
Eisenlohr, John
Parthasarathy, Srinivasan
Sadayappan, P.
SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 781 - 792
[22] Characterizing Dataset Dependence for Sparse Matrix-Vector Multiplication on GPUs
Sedaghati, Naser
Ashari, Arash
Pouchet, Louis-Noel
Parthasarathy, Srinivasan
Sadayappan, P.
2ND WORKSHOP ON PARALLEL PROGRAMMING FOR ANALYTICS APPLICATIONS (PPAA 2015), 2015, : 17 - 24
[23] Performance Evaluation of Sparse Matrix-Vector Multiplication Using GPU/MIC Cluster
Maeda, Hiroshi
Takahashi, Daisuke
PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2015, : 396 - 399
[24] Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU
Kubota, Yuji
Takahashi, Daisuke
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2011, PT II, 2011, 6783 : 547 - 561
[25] Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUs
Zhao, Zhixiang
Zhang, Guoyin
Wu, Yanxia
Hong, Ruize
Yang, Yiqing
Fu, Yan
JOURNAL OF SUPERCOMPUTING, 2024, 80 (10) : 13681 - 13713
[26] Efficient dense matrix-vector multiplication on GPU
He, Guixia
Gao, Jiaquan
Wang, Jun
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (19)
[27] Optimizing Sparse Matrix-Vector Multiplication on GPUs via Index Compression
Sun, Xue
Wei, Kai-Cheng
Lai, Lien-Fu
Tsai, Sung-Han
Wu, Chao-Chin
PROCEEDINGS OF 2018 IEEE 3RD ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC 2018), 2018, : 598 - 602
[28] A Family of Bit-Representation-Optimized Formats for Fast Sparse Matrix-Vector Multiplication on the GPU
Tang, Wai Teng
Tan, Wen Jun
Goh, Rick Siow Mong
Turner, Stephen John
Wong, Weng-Fai
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (09) : 2373 - 2385
[29] Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors
Zhang, Kai
Chen, Shuming
Wang, Yaohua
Wan, Jianghua
IEICE ELECTRONICS EXPRESS, 2013, 10 (09):
[30] SpDRAM: Efficient In-DRAM Acceleration of Sparse Matrix-Vector Multiplication
Kang, Jieui
Choi, Soeun
Lee, Eunjin
Sim, Jaehyeong
IEEE ACCESS, 2024, 12 : 176009 - 176021

← 1 2 3 4 5 →