Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU
被引:2
作者:
Yuan Tao
论文数: 0引用数: 0
h-index: 0
机构:
Jilin Normal Univ, Coll Math, Siping Jilin 136000, Peoples R ChinaJilin Normal Univ, Coll Math, Siping Jilin 136000, Peoples R China
Yuan Tao
[1
]
Huang Zhi-Bin
论文数: 0引用数: 0
h-index: 0
机构:
Beijing Univ Posts & Telecommun, Beijing Key Lab Intelligent Telecommun Software &, Beijing 100876, Peoples R ChinaJilin Normal Univ, Coll Math, Siping Jilin 136000, Peoples R China
Huang Zhi-Bin
[2
]
机构:
[1] Jilin Normal Univ, Coll Math, Siping Jilin 136000, Peoples R China
[2] Beijing Univ Posts & Telecommun, Beijing Key Lab Intelligent Telecommun Software &, Beijing 100876, Peoples R China
来源:
INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING
|
2016年
/
9卷
/
10期
基金:
中国博士后科学基金;
关键词:
gpu;
sparse matrix;
vector;
shuffle reduction;
D O I:
10.14257/ijgdc.2016.9.10.09
中图分类号:
TP31 [计算机软件];
学科分类号:
081202 ;
0835 ;
摘要:
GPU is the suitable equipment for accelerating computing-intensive applications in order to get the higher throughput for High Performance Computing (HPC). Sparse Matrix-Vector Multiplication (SpMV) is the core algorithm of HPC, so the SpMVs throughput on GPU may affect the throughput on HPC platform. In the paper, we focus on the latency of reduction routine in SpMV included in CUSP, such as accessing shared memory and bank conflicting while multiple threads simultaneously accessing the same bank. We provide shuffle method to reduce the partial results instead of reducing in the shared memory in order to improve the throughput of SpMV on Kepler GPU. Experiments show that shuffle method can improve the throughput up to 9% of the original routine of SpMV in CUSP on average.