GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks

被引:84
作者
Huang, Guyue [1 ]
Dai, Guohao [1 ]
Wang, Yu [1 ]
Yang, Huazhong [1 ]
机构
[1] Tsinghua Univ, BNRist, Dept Elect Engn, Beijing, Peoples R China
来源
PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20) | 2020年
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
LIBRARY;
D O I
10.1109/SC41405.2020.00076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The acceleration of Graph Neural Networks (GNNs) requires efficient and framework-compatible Sparse-Dense Matrix-Matrix Multiplication (SpMM). From the compatibility perspective, the sophisticated sparse matrix representations in state-of-the-art SpMM designs cause heavy preprocessing overhead for the framework. From the efficiency perspective, optimizations for SpMV (Sparse Matrix-Vector) do not apply well to SpMM, leading to redundant and uncoalesced global memory access. We propose GE-SpMM(1), which takes the CSR format consistent with GNN frameworks to enable integration without the format transformation overhead. We use Coalesced Row Caching to ensure coalesced access to both sparse and dense data in the global memory. We use Coarse-grained Warp Merging to reduce redundant data loading among GPU warps. Experiments on a real-world graph dataset demonstrate up to 1.41x speedup over Nvidia cuSPARSE [1] and up to 1.81x over GraphBLAST [2]. We embed GE-SpMM in GNN frameworks and get up to 3.67x speedup on popular GNN models like GCN [3] and GraphSAGE [4].
引用
收藏
页数:12
相关论文
共 31 条
[1]  
[Anonymous], 2001, SciPy: open source scientific tools for Python, DOI DOI 10.1002/MP.16056
[2]  
Bell N, 2009, STUDENTS GUIDE TO THE MA TESOL, P1
[3]   How Do the Open Source Communities Address Usability and UX Issues? An Exploratory Study [J].
Cheng, Jinghui ;
Guo, Jin L. C. .
CHI 2018: EXTENDED ABSTRACTS OF THE 2018 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2018,
[4]   The University of Florida Sparse Matrix Collection [J].
Davis, Timothy A. ;
Hu, Yifan .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2011, 38 (01)
[5]  
Fey Matthias, 2019, P WORKSH REPR LEARN
[6]  
Fout A., 2017, ADV NEURAL INF PROCE, V30, P6530
[7]  
Hamilton WL, 2017, ADV NEUR IN, V30
[8]   Adaptive Sparse Tiling for Sparse Matrix Multiplication [J].
Hong, Changwan ;
Sukumaran-Rajam, Aravind ;
Nisa, Israt ;
Singh, Kunal ;
Sadayappan, P. .
PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, :300-314
[9]   Efficient Sparse-Matrix Multi-Vector Product on GPUs [J].
Hong, Changwan ;
Sukumaran-Rajam, Aravind ;
Bandyopadhyay, Bortik ;
Kim, Jinsung ;
Kurt, Sureyya Emre ;
Nisa, Israt ;
Sabhlok, Shivani ;
Catalyurek, Umit V. ;
Parthasarathy, Srinivasan ;
Sadayappan, P. .
HPDC '18: PROCEEDINGS OF THE 27TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2018, :66-79
[10]  
Jia Z., IMPROVING ACCURACY S