Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU

被引：0

作者：

Zhao, Zhixiang ^{[1
]}

Wu, Yanxia ^{[1
]}

Zhang, Guoyin ^{[1
]}

Yang, Yiqing ^{[1
]}

Hong, Ruize ^{[1
]}

机构：

[1] Harbin Engn Univ, Dept Comp Sci, Harbin, Peoples R China

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2025年 / 37卷 / 4-5期

关键词：

GPU; memory bandwidth; sparse matrices; SpMV; OPTIMIZATION; FORMAT; SPMV; SIMD;

D O I：

10.1002/cpe.8366

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in scientific computing, machine learning, and data analysis. The performance of SpMV on GPUs is crucial for accelerating various applications. However, the efficiency of SpMV on GPUs is significantly affected by irregular memory access patterns, high memory bandwidth requirements, and insufficient exploitation of parallelism. In this paper, we propose a Recursive Hybrid Compression (RHC) method to address these challenges. RHC begins by splitting the initial matrix into two portions: an Ellpack (ELL) portion and a Coordinate (COO) portion. This partitioning is followed by further recursive division of the COO portion into additional ELL and COO portions, continuing this process until predefined termination criteria, based on a percentage threshold of the number of nonzero elements, are met. Additionally, we introduce a dynamic partitioning method to determine the optimal threshold for partitioning the matrix into ELL and COO portions based on the distribution of nonzero elements and the memory footprint. We develop the RHC algorithm to fully exploit the advantages of the ELL kernel on GPUs and achieve high thread-level parallelism. We evaluated our proposed method on two different NVIDIA GPUs: the GeForce RTX 2080 Ti and the A100, using a set of sparse matrices from the SuiteSparse Matrix Collection. We compare RHC with NVIDIA's cuSPARSE library and three state-of-the-art methods: SELLP, MergeBase, and BalanceCSR. RHC achieves average speedups of 2.13x$$ \times $$, 1.13x$$ \times $$, 1.87x$$ \times $$, and 1.27x$$ \times $$ over cuSPARSE, SELLP, MergeBase, and BalanceCSR, respectively.

引用

页数：13

共 50 条

[31] Multi-GPU Implementation and Performance Optimization for CSR-Based Sparse Matrix-Vector Multiplication
Guo, Ping
Zhang, Changjiang
PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2419 - 2423
[32] High-Performance Matrix-Vector Multiplication on the GPU
Sorensen, Hans Henrik Brandenborg
EURO-PAR 2011: PARALLEL PROCESSING WORKSHOPS, PT I, 2012, 7155 : 377 - 386
[33] Efficient Sparse Matrix-Vector Multiplication on Intel PIUMA Architecture
Aananthakrishnan, Sriram
Pawlowski, Robert
Fryman, Joshua
Hur, Ibrahim
2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
[34] Improving the Performance of the Symmetric Sparse Matrix-Vector Multiplication in Multicore
Gkountouvas, Theodoros
Karakasis, Vasileios
Kourtis, Kornilios
Goumas, Georgios
Koziris, Nectarios
IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 273 - 283
[35] Automatic tuning of sparse matrix-vector multiplication on multicore clusters
LI ShiGang
HU ChangJun
ZHANG JunChao
ZHANG YunQuan
Science China(Information Sciences), 2015, 58 (09) : 17 - 30
[36] Iterative Sparse Matrix-Vector Multiplication for Integer Factorization on GPUs
Schmidt, Bertil
Aribowo, Hans
Dang, Hoang-Vu
EURO-PAR 2011 PARALLEL PROCESSING, PT 2, 2011, 6853 : 413 - 424
[37] Automatic tuning of sparse matrix-vector multiplication on multicore clusters
Li ShiGang
Hu ChangJun
Zhang JunChao
Zhang YunQuan
SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (09) : 1 - 14
[38] Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Intel Xeon Phi
Elafrou, Athena
Goumas, Georgios
Koziris, Nectarios
2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 1389 - 1398
[39] HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors
Li, Wenxuan
Cheng, Helin
Lu, Zhengyang
Lu, Yuechen
Liu, Weifeng
2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, CLUSTER, 2023, : 209 - 220
[40] An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units
Abu-Sufah, Walid
Karim, Asma Abdel
2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 453 - 460

← 1 2 3 4 5 →