Design Principles for Sparse Matrix Multiplication on the GPU

被引:58
作者
Yang, Carl [1 ,2 ]
Buluc, Aydin [2 ,3 ]
Owens, John D. [1 ,2 ]
机构
[1] Univ Calif Davis, Davis, CA 95616 USA
[2] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Berkeley, CA 94720 USA
来源
EURO-PAR 2018: PARALLEL PROCESSING | 2018年 / 11014卷
基金
美国国家科学基金会;
关键词
Sparse matrix multiplication; Parallel; GPU;
D O I
10.1007/978-3-319-96983-1_48
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients-(i) merge-based load-balancing and (ii) row-major coalesced memory access we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.
引用
收藏
页码:672 / 687
页数:16
相关论文
共 28 条
  • [1] Aktulga H.M., 2014, P IPDPS
  • [2] [Anonymous], 2013, Adv Neural Inf Process Syst
  • [3] Bai Z., 2000, Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. Ed. by, DOI DOI 10.1137/1.9780898719581
  • [4] Baxter Sean., 2015, Modern GPU Library
  • [5] Bell N, 2009, STUDENTS GUIDE TO THE MA TESOL, P1
  • [6] Design of the GraphBLAS API for C
    Buluc, Aydin
    Mattson, Tim
    McMillan, Scott
    Moreira, Jose
    Yang, Carl
    [J]. 2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 643 - 652
  • [7] Buluç A, 2009, SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, P233
  • [8] Optimizing Sparse Matrix-Matrix Multiplication for the GPU
    Dalton, Steven
    Olson, Luke
    Bell, Nathan
    [J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2015, 41 (04):
  • [9] The University of Florida Sparse Matrix Collection
    Davis, Timothy A.
    Hu, Yifan
    [J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2011, 38 (01):
  • [10] Dongarra J. J., 2015, P S HIGH PERFORMANCE, P75