Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU

被引:2
作者
Yuan Tao [1 ]
Huang Zhi-Bin [2 ]
机构
[1] Jilin Normal Univ, Coll Math, Siping Jilin 136000, Peoples R China
[2] Beijing Univ Posts & Telecommun, Beijing Key Lab Intelligent Telecommun Software &, Beijing 100876, Peoples R China
来源
INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING | 2016年 / 9卷 / 10期
基金
中国博士后科学基金;
关键词
gpu; sparse matrix; vector; shuffle reduction;
D O I
10.14257/ijgdc.2016.9.10.09
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
GPU is the suitable equipment for accelerating computing-intensive applications in order to get the higher throughput for High Performance Computing (HPC). Sparse Matrix-Vector Multiplication (SpMV) is the core algorithm of HPC, so the SpMVs throughput on GPU may affect the throughput on HPC platform. In the paper, we focus on the latency of reduction routine in SpMV included in CUSP, such as accessing shared memory and bank conflicting while multiple threads simultaneously accessing the same bank. We provide shuffle method to reduce the partial results instead of reducing in the shared memory in order to improve the throughput of SpMV on Kepler GPU. Experiments show that shuffle method can improve the throughput up to 9% of the original routine of SpMV in CUSP on average.
引用
收藏
页码:99 / 106
页数:8
相关论文
共 5 条
  • [1] [Anonymous], 2012, CUDA C PROGRAMMING G
  • [2] [Anonymous], 2012, CUDA TOOLK 5 0
  • [3] Demouth J., 2013, GPU TECHN C CAL US M
  • [4] SURVEY OF SPARSE-MATRIX RESEARCH
    DUFF, IS
    [J]. PROCEEDINGS OF THE IEEE, 1977, 65 (04) : 500 - 535
  • [5] Nathan B., 2009, IMPLEMENTING SPARSE