Adaptive Sparse Tiling for Sparse Matrix Multiplication

被引:97
作者
Hong, Changwan [1 ]
Sukumaran-Rajam, Aravind [1 ]
Nisa, Israt [1 ]
Singh, Kunal [1 ]
Sadayappan, P. [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
来源
PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19) | 2019年
基金
美国国家科学基金会;
关键词
Sparse Matrix-matrix Multiplication; Sampled Dense-dense Matrix Multiplication; SpMM; SDDMM; GPU; multicore/manycore; Tiling; VECTOR MULTIPLICATION; SPMV; OPTIMIZATION; PERFORMANCE; PRODUCT; LIBRARY;
D O I
10.1145/3293883.3295712
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Tiling is a key technique for data locality optimization and is widely used in high-performance implementations of dense matrix-matrix multiplication for multicore/manycore CPUs and GPUs. However, the irregular and matrix-dependent data access pattern of sparse matrix multiplication makes it challenging to use tiling to enhance data reuse. In this paper, we devise an adaptive tiling strategy and apply it to enhance the performance of two primitives: SpMM (product of sparse matrix and dense matrix) and SDDMM (sampled dense-dense matrix multiplication). In contrast to studies that have resorted to non-standard sparse-matrix representations to enhance performance, we use the standard Compressed Sparse Row (CSR) representation, within which intra-row reordering is performed to enable adaptive tiling. Experimental evaluation using an extensive set of matrices from the Sparse Suite collection demonstrates significant performance improvement over currently available state-ofthe-art alternatives.
引用
收藏
页码:300 / 314
页数:15
相关论文
共 51 条
  • [1] Aktulga Hasan Metin, 2014, PAR DISTR PROC S 201
  • [2] [Anonymous], 2018, ARXIV180308601
  • [3] [Anonymous], 2018, The API reference guide for cuSPARSE, the CUDA sparse matrixlibrary.(v8.0 ed.).
  • [4] Anzt H., 2015, P S HIGH PERF COMP H, P75
  • [5] On improving linear solver performance: A block variant of GMRES
    Baker, AH
    Dennis, JM
    Jessup, ER
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2006, 27 (05) : 1608 - 1626
  • [6] Bell N, 2009, STUDENTS GUIDE TO THE MA TESOL, P1
  • [7] Buluc A., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P721, DOI 10.1109/IPDPS.2011.73
  • [8] Buluç A, 2008, 2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, P1876
  • [9] Design of the GraphBLAS API for C
    Buluc, Aydin
    Mattson, Tim
    McMillan, Scott
    Moreira, Jose
    Yang, Carl
    [J]. 2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 643 - 652
  • [10] Performance optimization and modeling of blocked sparse kernels
    Buttari, Alfredo
    Eijkhout, Victor
    Langou, Julien
    Filippone, Salvatore
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2007, 21 (04) : 467 - 484