Adaptive Sparse Tiling for Sparse Matrix Multiplication

被引：97

作者：

Hong, Changwan ^{[1
]}

Sukumaran-Rajam, Aravind ^{[1
]}

Nisa, Israt ^{[1
]}

Singh, Kunal ^{[1
]}

Sadayappan, P. ^{[1
]}

机构：

[1] Ohio State Univ, Columbus, OH 43210 USA

来源：

PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19) | 2019年

基金：

美国国家科学基金会;

关键词：

Sparse Matrix-matrix Multiplication; Sampled Dense-dense Matrix Multiplication; SpMM; SDDMM; GPU; multicore/manycore; Tiling; VECTOR MULTIPLICATION; SPMV; OPTIMIZATION; PERFORMANCE; PRODUCT; LIBRARY;

D O I：

10.1145/3293883.3295712

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Tiling is a key technique for data locality optimization and is widely used in high-performance implementations of dense matrix-matrix multiplication for multicore/manycore CPUs and GPUs. However, the irregular and matrix-dependent data access pattern of sparse matrix multiplication makes it challenging to use tiling to enhance data reuse. In this paper, we devise an adaptive tiling strategy and apply it to enhance the performance of two primitives: SpMM (product of sparse matrix and dense matrix) and SDDMM (sampled dense-dense matrix multiplication). In contrast to studies that have resorted to non-standard sparse-matrix representations to enhance performance, we use the standard Compressed Sparse Row (CSR) representation, within which intra-row reordering is performed to enable adaptive tiling. Experimental evaluation using an extensive set of matrices from the Sparse Suite collection demonstrates significant performance improvement over currently available state-ofthe-art alternatives.

引用

页码：300 / 314

页数：15

共 51 条

[1] Aktulga Hasan Metin, 2014, PAR DISTR PROC S 201
[2] [Anonymous], 2018, ARXIV180308601
[3] [Anonymous], 2018, The API reference guide for cuSPARSE, the CUDA sparse matrixlibrary.(v8.0 ed.).
[4] Anzt H., 2015, P S HIGH PERF COMP H, P75
[5] On improving linear solver performance: A block variant of GMRES
Baker, AH
Dennis, JM
Jessup, ER
[J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2006, 27 (05) : 1608 - 1626
[6] Bell N, 2009, STUDENTS GUIDE TO THE MA TESOL, P1
[7] Buluc A., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P721, DOI 10.1109/IPDPS.2011.73
[8] Buluç A, 2008, 2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, P1876
[9] Design of the GraphBLAS API for C
Buluc, Aydin
Mattson, Tim
McMillan, Scott
Moreira, Jose
Yang, Carl
[J]. 2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 643 - 652
[10] Performance optimization and modeling of blocked sparse kernels
Buttari, Alfredo
Eijkhout, Victor
Langou, Julien
Filippone, Salvatore
[J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2007, 21 (04) : 467 - 484

← 1 2 3 4 5 6 →