Sparse Matrix-Vector Multiplication Optimizations based on Matrix Bandwidth Reduction using NVIDIA CUDA

被引:7
|
作者
Xu, Shiming [1 ]
Lin, Hai Xiang [1 ]
Xue, Wei [2 ]
机构
[1] Delft Univ Technol, Delft Inst Appl Math, Delft, Netherlands
[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
来源
PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES 2010) | 2010年
关键词
SpMV; GP-GPU; NVIDIA CUDA; RCM;
D O I
10.1109/DCABES.2010.162
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper we propose the optimization of sparse matrix-vector multiplication (SpMV) with CUDA based on matrix bandwidth/profile reduction techniques. Computational time required to access dense vector is decoupled from SpMV computation. By reducing the matrix profile, the time required to access dense vector is reduced by 17% (for SP) and 24% (for DP). Reduced matrix bandwidth enables column index information compression with shorter formats, resulting in a 17% (for SP) and 10% (for DP) execution time reduction for accessing matrix data under ELLPACK format. The overall speedup for SpMV is 16% and 12.6% for the whole matrix test suite. The optimization proposed in this paper can be combined with other SpMV optimizations such as register blocking.
引用
收藏
页码:609 / 614
页数:6
相关论文
共 50 条
  • [21] Iterative Sparse Matrix-Vector Multiplication for Integer Factorization on GPUs
    Schmidt, Bertil
    Aribowo, Hans
    Dang, Hoang-Vu
    EURO-PAR 2011 PARALLEL PROCESSING, PT 2, 2011, 6853 : 413 - 424
  • [22] TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs
    Niu, Yuyao
    Lu, Zhengyang
    Dong, Meichen
    Jin, Zhou
    Liu, Weifeng
    Tan, Guangming
    2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 68 - 78
  • [23] Automatic tuning of sparse matrix-vector multiplication on multicore clusters
    LI ShiGang
    HU ChangJun
    ZHANG JunChao
    ZHANG YunQuan
    Science China(Information Sciences), 2015, 58 (09) : 17 - 30
  • [24] Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU
    Zhao, Zhixiang
    Wu, Yanxia
    Zhang, Guoyin
    Yang, Yiqing
    Hong, Ruize
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2025, 37 (4-5)
  • [25] Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs
    Tanabe, Noboru
    Ogawa, Yuuka
    Takata, Masami
    Joe, Kazuki
    PROCEEDINGS OF THE 19TH INTERNATIONAL EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING, 2011, : 101 - 108
  • [26] Automatic tuning of sparse matrix-vector multiplication on multicore clusters
    Li ShiGang
    Hu ChangJun
    Zhang JunChao
    Zhang YunQuan
    SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (09) : 1 - 14
  • [27] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications
    Ashari, Arash
    Sedaghati, Naser
    Eisenlohr, John
    Parthasarathy, Srinivasan
    Sadayappan, P.
    SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 781 - 792
  • [28] Characterizing Dataset Dependence for Sparse Matrix-Vector Multiplication on GPUs
    Sedaghati, Naser
    Ashari, Arash
    Pouchet, Louis-Noel
    Parthasarathy, Srinivasan
    Sadayappan, P.
    2ND WORKSHOP ON PARALLEL PROGRAMMING FOR ANALYTICS APPLICATIONS (PPAA 2015), 2015, : 17 - 24
  • [29] Merge-based Sparse Matrix-Vector Multiplication (SpMV) using the CSR Storage Format
    Merrill, Duane
    Garland, Michael
    ACM SIGPLAN NOTICES, 2016, 51 (08) : 389 - 390
  • [30] Joint direct and transposed sparse matrix-vector multiplication for multithreaded CPUs
    Kozicky, Claudio
    Simecek, Ivan
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (13)