Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors

被引:47
作者
Liu, Weifeng [1 ]
Vinter, Brian [1 ]
机构
[1] Univ Copenhagen, Niels Bohr Inst, Blegdamsvej 17, DK-2100 Copenhagen, Denmark
关键词
Sparse matrices; Sparse matrix-vector multiplication; Compressed sparse row; Speculative execution; Segmented sum; Heterogeneous processors;
D O I
10.1016/j.parco.2015.04.004
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect result. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over the best existing CSR-based SpMV algorithms. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:179 / 193
页数:15
相关论文
共 56 条
[1]  
AMD, 2014, WHIT PAP COMP COR
[2]  
[Anonymous], RC24704 IBM
[3]  
[Anonymous], 1993, CMUCS93173
[4]   An Efficient Two-Dimensional Blocking Strategy for Sparse Matrix-Vector Multiplication on GPUs [J].
Ashari, Arash ;
Sedaghati, Naser ;
Eisenlohr, John ;
Sadayappan, P. .
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14), 2014, :273-282
[5]   Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications [J].
Ashari, Arash ;
Sedaghati, Naser ;
Eisenlohr, John ;
Parthasarathy, Srinivasan ;
Sadayappan, P. .
SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, :781-792
[6]  
Balay S., 2018, Technical Report ANL-95/11 - Revision 3.10
[7]  
Baskaran MM, 2008, ICS'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, P225
[8]  
Baxter Sean., 2013, Modern GPU
[9]  
Bell N, 2009, STUDENTS GUIDE TO THE MA TESOL, P1
[10]   AMD FUSION APU: LLANO [J].
Branover, Alexander ;
Foley, Denis ;
Steinman, Maurice .
IEEE MICRO, 2012, 32 (02) :28-37