Adaptive Sparse Tiling for Sparse Matrix Multiplication

被引:97
作者
Hong, Changwan [1 ]
Sukumaran-Rajam, Aravind [1 ]
Nisa, Israt [1 ]
Singh, Kunal [1 ]
Sadayappan, P. [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
来源
PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19) | 2019年
基金
美国国家科学基金会;
关键词
Sparse Matrix-matrix Multiplication; Sampled Dense-dense Matrix Multiplication; SpMM; SDDMM; GPU; multicore/manycore; Tiling; VECTOR MULTIPLICATION; SPMV; OPTIMIZATION; PERFORMANCE; PRODUCT; LIBRARY;
D O I
10.1145/3293883.3295712
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Tiling is a key technique for data locality optimization and is widely used in high-performance implementations of dense matrix-matrix multiplication for multicore/manycore CPUs and GPUs. However, the irregular and matrix-dependent data access pattern of sparse matrix multiplication makes it challenging to use tiling to enhance data reuse. In this paper, we devise an adaptive tiling strategy and apply it to enhance the performance of two primitives: SpMM (product of sparse matrix and dense matrix) and SDDMM (sampled dense-dense matrix multiplication). In contrast to studies that have resorted to non-standard sparse-matrix representations to enhance performance, we use the standard Compressed Sparse Row (CSR) representation, within which intra-row reordering is performed to enable adaptive tiling. Experimental evaluation using an extensive set of matrices from the Sparse Suite collection demonstrates significant performance improvement over currently available state-ofthe-art alternatives.
引用
收藏
页码:300 / 314
页数:15
相关论文
共 51 条
[11]   Collaborative filtering with privacy [J].
Canny, J .
2002 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, PROCEEDINGS, 2002, :45-57
[12]  
Canny John, 2013, BIGLEARN WORKSH NIPS, P117
[13]  
Dalton Steven, 2014, Cusp: Generic parallel algorithms for sparse matrix and graph computations
[14]   The University of Florida Sparse Matrix Collection [J].
Davis, Timothy A. ;
Hu, Yifan .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2011, 38 (01)
[15]   Building High Performance System for Processing a Daily Large Volume of Chinese Satellites Imagery [J].
Deng, Huawu ;
Huang, Shicun ;
Wang, Qi ;
Pan, Zhiqiang ;
Xin, Yubin .
HIGH-PERFORMANCE COMPUTING IN REMOTE SENSING IV, 2014, 9247
[16]   Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format [J].
Greathouse, Joseph L. ;
Daga, Mayank .
SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, :769-780
[17]  
Guennebaud G., 2010, Eigen v3
[18]  
Han S., 2016, INT C LEARN REPR ICL
[19]   Efficient Sparse-Matrix Multi-Vector Product on GPUs [J].
Hong, Changwan ;
Sukumaran-Rajam, Aravind ;
Bandyopadhyay, Bortik ;
Kim, Jinsung ;
Kurt, Sureyya Emre ;
Nisa, Israt ;
Sabhlok, Shivani ;
Catalyurek, Umit V. ;
Parthasarathy, Srinivasan ;
Sadayappan, P. .
HPDC '18: PROCEEDINGS OF THE 27TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2018, :66-79
[20]   A fast and high quality multilevel scheme for partitioning irregular graphs [J].
Karypis, G ;
Kumar, V .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 20 (01) :359-392