Design Principles for Sparse Matrix Multiplication on the GPU

被引：58

作者：

Yang, Carl ^{[1
,2
]}

Buluc, Aydin ^{[2
,3
]}

Owens, John D. ^{[1
,2
]}

机构：

[1] Univ Calif Davis, Davis, CA 95616 USA

[2] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA

[3] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

EURO-PAR 2018: PARALLEL PROCESSING | 2018年 / 11014卷

基金：

美国国家科学基金会;

关键词：

Sparse matrix multiplication; Parallel; GPU;

D O I：

10.1007/978-3-319-96983-1_48

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients-(i) merge-based load-balancing and (ii) row-major coalesced memory access we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.

引用

页码：672 / 687

页数：16

共 28 条

[1] Aktulga H.M., 2014, P IPDPS
[2] [Anonymous], 2013, Adv Neural Inf Process Syst
[3] Bai Z., 2000, Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. Ed. by, DOI DOI 10.1137/1.9780898719581
[4] Baxter Sean., 2015, Modern GPU Library
[5] Bell N, 2009, STUDENTS GUIDE TO THE MA TESOL, P1
[6] Design of the GraphBLAS API for C
Buluc, Aydin
Mattson, Tim
McMillan, Scott
Moreira, Jose
Yang, Carl
[J]. 2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 643 - 652
[7] Buluç A, 2009, SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, P233
[8] Optimizing Sparse Matrix-Matrix Multiplication for the GPU
Dalton, Steven
Olson, Luke
Bell, Nathan
[J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2015, 41 (04):
[9] The University of Florida Sparse Matrix Collection
Davis, Timothy A.
Hu, Yifan
[J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2011, 38 (01):
[10] Dongarra J. J., 2015, P S HIGH PERFORMANCE, P75

← 1 2 3 →