Load-balancing Sparse Matrix Vector Product Kernels on GPUs

被引：34

作者：

Anzt, Hartwig ^{[1
,2
]}

Cojean, Terry ^{[1
]}

Chen, Yen-Chen ^{[3
,7
]}

Dongarra, Jack ^{[2
,4
,5
]}

Flegar, Goran ^{[6
]}

Nayak, Pratik ^{[1
]}

Tomov, Stanimire ^{[2
]}

Tsai, Yuhsiang M. ^{[3
]}

Wang, Weichung ^{[3
]}

机构：

[1] Karlsruhe Inst Technol, Karlsruhe, Germany

[2] Univ Tennessee, Knoxville, TN 37996 USA

[3] Natl Taiwan Univ, Taipei, Taiwan

[4] Oak Ridge Natl Lab, Oak Ridge, TN USA

[5] Univ Manchester, Manchester, Lancs, England

[6] Univ Jaume 1, Castellon De La Plana, Spain

[7] Univ Tokyo, Tokyo, Japan

来源：

ACM TRANSACTIONS ON PARALLEL COMPUTING | 2020年 / 7卷 / 01期

关键词：

Sparse Matrix Vector Product (SpMV); irregular matrices; GPUs; MULTIPLICATION;

D O I：

10.1145/3380930

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational techniques, and implementations that strike a balance between thread divergence, which is inherent for Irregular Matrices, and padding, which alleviates the performance-detrimental thread divergence but introduces artificial overheads. To this end, in this article, we address the challenge of designing high performance sparse matrix-vector product (SpMV) kernels designed for Nvidia Graphics Processing Units (GPUs). We present a compressed sparse row (CSR) format suitable for unbalanced matrices. We also provide a load-balancing kernel for the coordinate (COO) matrix format and extend it to a hybrid algorithm that stores part of the matrix in SIMD-friendly Ellpack format (ELL) format. The ratio between the ELL- and the COO-part is determined using a theoretical analysis of the nonzeros-per-row distribution. For the over 2,800 test matrices available in the Suite Sparse matrix collection, we compare the performance against SpMV kernels provided by NVIDIA's cuSPARSE library and a heavily-tuned sliced ELL (SELL-P) kernel that prevents unnecessary padding by considering the irregular matrices as a combination of matrix blocks stored in ELL format.

引用

页数：26

共 29 条

[1]

Angerson E., 1990, Proceedings of Supercomputing '90 (Cat. No.90CH2916-5), P2, DOI 10.1109/SUPERC.1990.129995

[2]

[Anonymous], 2017, WHIT NVIDIA TESLA V1

[3]

[Anonymous], 2012, Google's PageRank and Beyond: The Science of Search Engine Rankings

[4]

[Anonymous], 2015, ACM Transactions on Parallel Computing

[5]

Anzt H., 2014, Tech. Rep. ut-eecs-14-727

[6] Towards Continuous Benchmarking: An Automated Performance Evaluation Framework for High Performance Software [J].

Anzt, Hartwig ;

Chen, Yen-Chen ;

Cojean, Terry ;

Dongarra, Jack ;

Flegar, Goran ;

Nayak, Pratik ;

Quintana-Orti, Enrique S. ;

Tsai, Yuhsiang M. ;

Wang, Weichung .

PROCEEDINGS OF THE PLATFORM FOR ADVANCED SCIENTIFIC COMPUTING CONFERENCE (PASC '19), 2019,

[7] Preconditioned Krylov solvers on GPUs [J].

Anzt, Hartwig ;

Gates, Mark ;

Dongarra, Jack ;

Kreutzer, Moritz ;

Wellein, Gerhard ;

Koehler, Martin .

PARALLEL COMPUTING, 2017, 68 :32-44

[8]

Barrett R., 1994, TEMPLATES SOLUTION L

[9]

Bell N, 2009, STUDENTS GUIDE TO THE MA TESOL, P1

[10] A survey on feature selection methods [J].

Chandrashekar, Girish ;

Sahin, Ferat .

COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28

← 1 2 3 →