Automatic Tuning of Sparse Matrix-Vector Multiplication for CRS format on GPUs

被引:11
|
作者
Yoshizawa, Hiroki [1 ]
Takahashi, Daisuke [2 ]
机构
[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, 1-1-1 Tennodai, Tsukuba, Ibaraki 3058573, Japan
[2] Univ Tsukuba, Fac Engn Informat & Syst, Tsukuba, Ibaraki 3058573, Japan
来源
15TH IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2012) / 10TH IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC 2012) | 2012年
基金
日本科学技术振兴机构;
关键词
SpMV; CRS; CG; GPGPU; CUDA;
D O I
10.1109/ICCSE.2012.28
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Performance of sparse matrix-vector multiplication (SpMV) on GPUs is highly dependent on the structure of the sparse matrix used in the computation, the computing environment, and the selection of certain parameters. In this paper, we show that the performance achieved using kernel SpMV on GPUs for the compressed row storage (CRS) format depends greatly on optimal selection of a parameter, and we propose an efficient algorithm for the automatic selection of the optimal parameter. Kernel SpMV for the CRS format using automatic parameter selection achieves up to approximately 26% improvement over NVIDIA's CUSPARSE library. The conjugate gradient method is the most popular iterative method for solving sparse systems of linear equations. Kernel SpMV makes up the bulk of the conjugate gradient method calculations. By optimizing SpMV using our approach, the conjugate gradient method performs up to approximately 10% better than CULA Sparse.
引用
收藏
页码:130 / 136
页数:7
相关论文
共 50 条
  • [21] Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation
    Kreutzer, Moritz
    Hager, Georg
    Wellein, Gerhard
    Fehske, Holger
    Basermann, Achim
    Bishop, Alan R.
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 1696 - 1702
  • [22] A model-driven blocking strategy for load balanced sparse matrix-vector multiplication on GPUs
    Ashari, Arash
    Sedaghati, Naser
    Eisenlohr, John
    Sadayappan, P.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 76 : 3 - 15
  • [23] Sparse Matrix-Vector Multiplication on GPGPUs
    Filippone, Salvatore
    Cardellini, Valeria
    Barbieri, Davide
    Fanfarillo, Alessandro
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2017, 43 (04):
  • [24] TaiChi: A Hybrid Compression Format for Binary Sparse Matrix-Vector Multiplication on GPU
    Gao, Jianhua
    Ji, Weixing
    Tan, Zhaonian
    Wang, Yizhuo
    Shi, Feng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3732 - 3745
  • [25] Hierarchical Matrix Operations on GPUs: Matrix-Vector Multiplication and Compression
    Boukaram, Wajih
    Turkiyyah, George
    Keyes, David
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2019, 45 (01):
  • [26] CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations
    Dang, Hoang-Vu
    Schmidt, Bertil
    PARALLEL COMPUTING, 2013, 39 (11) : 737 - 750
  • [27] Adaptive diagonal sparse matrix-vector multiplication on GPU
    Gao, Jiaquan
    Xia, Yifei
    Yin, Renjie
    He, Guixia
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 157 : 287 - 302
  • [28] A Memory Transaction Model for Sparse Matrix-Vector Multiplications on GPUs
    Keklikian, Thalie
    Langlois, J. M. Pierre
    Savaria, Yvon
    2014 IEEE 12TH INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS), 2014, : 309 - 312
  • [29] Node aware sparse matrix-vector multiplication
    Bienz, Amanda
    Gropp, William D.
    Olson, Luke N.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 130 : 166 - 178
  • [30] Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUs
    Zhao, Zhixiang
    Zhang, Guoyin
    Wu, Yanxia
    Hong, Ruize
    Yang, Yiqing
    Fu, Yan
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (10) : 13681 - 13713