Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks

被引：11

作者：

Lee, Jeongmyung ^{[1
]}

Kang, Seokwon ^{[1
]}

Yu, Yongseung ^{[1
]}

Jo, Yong-Yeon ^{[1
]}

Kim, Sang-Wook ^{[1
]}

Park, Yongjun ^{[1
]}

机构：

[1] Hanyang Univ, Dept Comp Sci, Seoul, South Korea

来源：

2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020) | 2020年

关键词：

Sparse matrix multiplication; sparse network; GPU; linear algebra; EFFICIENT;

D O I：

10.1109/ICDE48307.2020.00085

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sparse matrix multiplication (spGEMM) is widely used to analyze the sparse network data, and extract important information based on matrix representation. As it contains a high degree of data parallelism, many efficient implementations using data-parallel programming platforms such as CUDA and OpenCL have been introduced on graphic processing units (GPUs). Several well-known spGEMM techniques, such as cuSPARSE and CUSP, often do not utilize the GPU resources fully, owing to the load imbalance between threads in the expansion process and high memory contention in the merge process. Furthermore, even though several outer-product-based spGEMM techniques are proposed to solve the load balancing problem on expansion, they still do not utilize the GPU resources fully, because severe computation load variations exist among the multiple thread blocks. To solve these challenges, this paper proposes a new optimization pass called Block Reorganizer, which balances the total computations of each computing unit on target GPUs, based on the outer-product-based expansion process, and reduces the memory pressure during the merge process. For expansion, it first identifies the actual computation amount for each block, and then performs two thread block transformation processes based on their characteristics: 1) B-Splitting to transform a heavy-computation blocks into multiple small blocks and 2) B Gathering to aggregate multiple small-computation blocks to a larger block. While merging, it improves the overall performance by performing B-Limiting to limit the number of blocks on each computing unit. Experimental results show that it improves the total performance of kernel execution by 1.43x, on an average, when compared to the row-product-based spGEMM, for NVIDIA Titan Xp GPUs on real-world datasets.

引用

页码：925 / 936

页数：12

共 50 条

[21] Optimization techniques for sparse matrix-vector multiplication on GPUs
Maggioni, Marco
Berger-Wolf, Tanya
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 93-94 : 66 - 86
[22] Tensor Core-Adapted Sparse Matrix Multiplication for Accelerating Sparse Deep Neural Networks
Han, Yoonsang
Kim, Inseo
Kim, Jinsung
Moon, Gordon Euhyun
[J]. ELECTRONICS, 2024, 13 (20)
[23] Sparse Matrix-Vector Multiplication on GPGPUs
Filippone, Salvatore
Cardellini, Valeria
Barbieri, Davide
Fanfarillo, Alessandro
[J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2017, 43 (04):
[24] GPU-BASED LINEAR PROGRAMMING: AN APPLICATION TO SEISMIC SPARSE LAYER INVERSION
Saha, Ratul Kishore
Ghosh, Tiash
Panda, Sanket Smarak
Swain, Satyajit
Jenamani, Mamata
Routray, Aurobinda
Singh, Sanjai Kumar
Mondal, Arpita
[J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5093 - 5096
[25] Accelerating Sparse General Matrix-Matrix Multiplication for NVIDIA Volta GPU and Hygon DCU
Tian, Zhuo
Yang, Shuai
Zhang, Changyou
[J]. PROCEEDINGS OF THE 32ND INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2023, 2023, : 329 - 330
[26] SPMSD: An Partitioning-Strategy for Parallel General Sparse Matrix-Matrix Multiplication on GPU
Cui, Huanyu
Wang, Nianbin
Han, Qilong
Wang, Ye
[J]. PARALLEL PROCESSING LETTERS, 2024, 34 (02)
[27] Evaluating the soft error sensitivity of a GPU-based SoC for matrix multiplication
Leon, German
Badia, Jose M.
Belloch, Jose A.
Lindoso, Almudena
Entrena, Luis
[J]. MICROELECTRONICS RELIABILITY, 2020, 114 (114)
[28] Merge-based Parallel Sparse Matrix-Vector Multiplication
Merrill, Duane
Garland, Michael
[J]. SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2016, : 678 - 689
[29] Performance Evaluation of Sparse Matrix-Vector Multiplication Using GPU/MIC Cluster
Maeda, Hiroshi
Takahashi, Daisuke
[J]. PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2015, : 396 - 399
[30] TaiChi: A Hybrid Compression Format for Binary Sparse Matrix-Vector Multiplication on GPU
Gao, Jianhua
Ji, Weixing
Tan, Zhaonian
Wang, Yizhuo
Shi, Feng
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3732 - 3745

← 1 2 3 4 5 →