Sparse Matrix-Vector Multiplication Optimizations based on Matrix Bandwidth Reduction using NVIDIA CUDA

被引:7
|
作者
Xu, Shiming [1 ]
Lin, Hai Xiang [1 ]
Xue, Wei [2 ]
机构
[1] Delft Univ Technol, Delft Inst Appl Math, Delft, Netherlands
[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
来源
PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES 2010) | 2010年
关键词
SpMV; GP-GPU; NVIDIA CUDA; RCM;
D O I
10.1109/DCABES.2010.162
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper we propose the optimization of sparse matrix-vector multiplication (SpMV) with CUDA based on matrix bandwidth/profile reduction techniques. Computational time required to access dense vector is decoupled from SpMV computation. By reducing the matrix profile, the time required to access dense vector is reduced by 17% (for SP) and 24% (for DP). Reduced matrix bandwidth enables column index information compression with shorter formats, resulting in a 17% (for SP) and 10% (for DP) execution time reduction for accessing matrix data under ELLPACK format. The overall speedup for SpMV is 16% and 12.6% for the whole matrix test suite. The optimization proposed in this paper can be combined with other SpMV optimizations such as register blocking.
引用
收藏
页码:609 / 614
页数:6
相关论文
共 50 条
  • [31] Sparse Matrix-Vector Multiplication Cache Performance Evaluation and Design Exploration
    Cui, Jianfeng
    Lu, Kai
    Liu, Sheng
    29TH INTERNATIONAL SYMPOSIUM ON THE MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2021), 2021, : 97 - 103
  • [32] CoAdELL: Adaptivity and Compression for Improving Sparse Matrix-Vector Multiplication on GPUs
    Maggioni, Marco
    Berger-Wolf, Tanya
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 934 - 941
  • [33] Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors
    Zhang, Kai
    Chen, Shuming
    Wang, Yaohua
    Wan, Jianghua
    IEICE ELECTRONICS EXPRESS, 2013, 10 (09):
  • [34] SpDRAM: Efficient In-DRAM Acceleration of Sparse Matrix-Vector Multiplication
    Kang, Jieui
    Choi, Soeun
    Lee, Eunjin
    Sim, Jaehyeong
    IEEE ACCESS, 2024, 12 : 176009 - 176021
  • [35] CUDA GPU libraries and novel sparse matrix-vector multiplication - Implementation and performance enhancement in unstructured finite element computations
    Haney R.
    Mohan R.
    International Journal of Computational Science and Engineering, 2019, 20 (04): : 501 - 507
  • [36] Auto-tuning of Sparse Matrix-Vector Multiplication on Graphics Processors
    Abu-Sufah, Walid
    Karim, Asma Abdel
    SUPERCOMPUTING (ISC 2013), 2013, 7905 : 151 - 164
  • [37] Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Intel Xeon Phi
    Elafrou, Athena
    Goumas, Georgios
    Koziris, Nectarios
    2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 1389 - 1398
  • [38] Optimized Data Reuse via Reordering for Sparse Matrix-Vector Multiplication on FPGAs
    Li, Shiqing
    Liu, Di
    Liu, Weichen
    2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD), 2021,
  • [39] A Fully Structure-Driven Performance Analysis of Sparse Matrix-Vector Multiplication
    Sandhu, Prabhjot
    Verbrugge, Clark
    Hendren, Laurie
    PROCEEDINGS OF THE ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE'20), 2020, : 108 - 119
  • [40] CUDA GPU libraries and novel sparse matrix-vector multiplication-implementation and performance enhancement in unstructured finite element computations
    Haney, Richard
    Mohan, Ram
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2019, 20 (04) : 501 - 507