An efficient SIMD compression format for sparse matrix-vector multiplication

被引:13
|
作者
Chen, Xinhai [1 ]
Xie, Peizhen [1 ]
Chi, Lihua [2 ]
Liu, Jie [1 ]
Gong, Chunye [1 ]
机构
[1] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc Lab, Changsha, Hunan, Peoples R China
[2] Hunan Inst Traff Engn, Inst Adv Sci & Technol, Hengyang, Peoples R China
来源
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2018年 / 30卷 / 23期
基金
中国博士后科学基金; 中国国家自然科学基金; 国家高技术研究发展计划(863计划);
关键词
compressed sparse row format (CSR); performance optimization; single instruction multiple data (SIMD); sparse matrix-vector multiplication (SpMV); SPMV; GPU; OPTIMIZATION; CPU;
D O I
10.1002/cpe.4800
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Sparse matrix-vector multiplication (SpMV) is an essential kernel in sparse linear algebra and has been studied extensively on all modern processor and accelerator architectures. Compressed Sparse Row (CSR) is a frequently used format for sparse matrices storage. However, CSR-based SpMV has poor performance on processors with vector units. In order to take full advantage of SIMD acceleration technology in SpMV, we proposed a new matrix storage format called CSR-SIMD. The new storage format compresses the non-zero elements into many variable-length data fragments with consecutive memory access addresses. Thus, the data locality of sparse matrix A and dense vector x expands and the floating-point operations for each fragment can be completely calculated by vectorized implementation on wide SIMD units. Our experimental results indicate that CSR-SIMD has better storage efficiency and low-overhead for format conversion. Besides, the new format achieves high scalability on wide SIMD units. In comparison with the CSR-based and BCSR-based SpMV, CSR-SIMD obtains better performance on FT1500A, Intel Xeon, and Intel Xeon Phi.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication
    Karakasis, Vasileios
    Gkountouvas, Theodoros
    Kourtis, Kornilios
    Goumas, Georgios
    Koziris, Nectarios
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (10) : 1930 - 1940
  • [2] CSR&RV: An Efficient Value Compression Format for Sparse Matrix-Vector Multiplication
    Yan, Junjun
    Chen, Xinhai
    Liu, Jie
    NETWORK AND PARALLEL COMPUTING, NPC 2022, 2022, 13615 : 54 - 60
  • [3] A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS
    Kreutzer, Moritz
    Hager, Georg
    Wellein, Gerhard
    Fehske, Holger
    Bishop, Alan R.
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2014, 36 (05): : C401 - C423
  • [4] VBSF: a new storage format for SIMD sparse matrix-vector multiplication on modern processors
    Li, Yishui
    Xie, Peizhen
    Chen, Xinhai
    Liu, Jie
    Yang, Bo
    Li, Shengguo
    Gong, Chunye
    Gan, Xinbiao
    Xu, Han
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (03): : 2063 - 2081
  • [5] TaiChi: A Hybrid Compression Format for Binary Sparse Matrix-Vector Multiplication on GPU
    Gao, Jianhua
    Ji, Weixing
    Tan, Zhaonian
    Wang, Yizhuo
    Shi, Feng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3732 - 3745
  • [6] Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format
    Greathouse, Joseph L.
    Daga, Mayank
    SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 769 - 780
  • [7] Adaptive Hybrid Storage Format for Sparse Matrix-Vector Multiplication on Multi-Core SIMD CPUs
    Chen, Shizhao
    Fang, Jianbin
    Xu, Chuanfu
    Wang, Zheng
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [8] Structured sparse matrix-vector multiplication on massively parallel SIMD architectures
    Dehn, T
    Eiermann, M
    Giebermann, K
    Sperling, V
    PARALLEL COMPUTING, 1995, 21 (12) : 1867 - 1894
  • [9] Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors
    Zhang, Kai
    Chen, Shuming
    Wang, Yaohua
    Wan, Jianghua
    IEICE ELECTRONICS EXPRESS, 2013, 10 (09):
  • [10] SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision
    Hishinuma, Toshiaki
    Hasegawa, Hidehiko
    Tanaka, Teruo
    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 21 - 34