Automatic Tuning of Sparse Matrix-Vector Multiplication for CRS format on GPUs

被引:11
|
作者
Yoshizawa, Hiroki [1 ]
Takahashi, Daisuke [2 ]
机构
[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, 1-1-1 Tennodai, Tsukuba, Ibaraki 3058573, Japan
[2] Univ Tsukuba, Fac Engn Informat & Syst, Tsukuba, Ibaraki 3058573, Japan
来源
15TH IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2012) / 10TH IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC 2012) | 2012年
基金
日本科学技术振兴机构;
关键词
SpMV; CRS; CG; GPGPU; CUDA;
D O I
10.1109/ICCSE.2012.28
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Performance of sparse matrix-vector multiplication (SpMV) on GPUs is highly dependent on the structure of the sparse matrix used in the computation, the computing environment, and the selection of certain parameters. In this paper, we show that the performance achieved using kernel SpMV on GPUs for the compressed row storage (CRS) format depends greatly on optimal selection of a parameter, and we propose an efficient algorithm for the automatic selection of the optimal parameter. Kernel SpMV for the CRS format using automatic parameter selection achieves up to approximately 26% improvement over NVIDIA's CUSPARSE library. The conjugate gradient method is the most popular iterative method for solving sparse systems of linear equations. Kernel SpMV makes up the bulk of the conjugate gradient method calculations. By optimizing SpMV using our approach, the conjugate gradient method performs up to approximately 10% better than CULA Sparse.
引用
收藏
页码:130 / 136
页数:7
相关论文
共 50 条
  • [41] Improving the Performance of the Symmetric Sparse Matrix-Vector Multiplication in Multicore
    Gkountouvas, Theodoros
    Karakasis, Vasileios
    Kourtis, Kornilios
    Goumas, Georgios
    Koziris, Nectarios
    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 273 - 283
  • [42] A sparse matrix-vector multiplication method with low preprocessing cost
    Aktemur, Baris
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (21)
  • [43] Merge-based Parallel Sparse Matrix-Vector Multiplication
    Merrill, Duane
    Garland, Michael
    SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2016, : 678 - 689
  • [44] Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU
    Zhao, Zhixiang
    Wu, Yanxia
    Zhang, Guoyin
    Yang, Yiqing
    Hong, Ruize
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2025, 37 (4-5)
  • [45] An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units
    Abu-Sufah, Walid
    Karim, Asma Abdel
    2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 453 - 460
  • [46] Sparse Matrix-Vector Multiplication Optimizations based on Matrix Bandwidth Reduction using NVIDIA CUDA
    Xu, Shiming
    Lin, Hai Xiang
    Xue, Wei
    PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES 2010), 2010, : 609 - 614
  • [47] SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision
    Hishinuma, Toshiaki
    Hasegawa, Hidehiko
    Tanaka, Teruo
    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 21 - 34
  • [48] Joint direct and transposed sparse matrix-vector multiplication for multithreaded CPUs
    Kozicky, Claudio
    Simecek, Ivan
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (13)
  • [49] Sparse Matrix-Vector Multiplication Cache Performance Evaluation and Design Exploration
    Cui, Jianfeng
    Lu, Kai
    Liu, Sheng
    29TH INTERNATIONAL SYMPOSIUM ON THE MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2021), 2021, : 97 - 103
  • [50] Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors
    Zhang, Kai
    Chen, Shuming
    Wang, Yaohua
    Wan, Jianghua
    IEICE ELECTRONICS EXPRESS, 2013, 10 (09):