An efficient SIMD compression format for sparse matrix-vector multiplication

被引：13

作者：

Chen, Xinhai ^{[1
]}

Xie, Peizhen ^{[1
]}

Chi, Lihua ^{[2
]}

Liu, Jie ^{[1
]}

Gong, Chunye ^{[1
]}

机构：

[1] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc Lab, Changsha, Hunan, Peoples R China

[2] Hunan Inst Traff Engn, Inst Adv Sci & Technol, Hengyang, Peoples R China

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2018年 / 30卷 / 23期

基金：

中国博士后科学基金; 中国国家自然科学基金; 国家高技术研究发展计划(863计划);

关键词：

compressed sparse row format (CSR); performance optimization; single instruction multiple data (SIMD); sparse matrix-vector multiplication (SpMV); SPMV; GPU; OPTIMIZATION; CPU;

D O I：

10.1002/cpe.4800

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Sparse matrix-vector multiplication (SpMV) is an essential kernel in sparse linear algebra and has been studied extensively on all modern processor and accelerator architectures. Compressed Sparse Row (CSR) is a frequently used format for sparse matrices storage. However, CSR-based SpMV has poor performance on processors with vector units. In order to take full advantage of SIMD acceleration technology in SpMV, we proposed a new matrix storage format called CSR-SIMD. The new storage format compresses the non-zero elements into many variable-length data fragments with consecutive memory access addresses. Thus, the data locality of sparse matrix A and dense vector x expands and the floating-point operations for each fragment can be completely calculated by vectorized implementation on wide SIMD units. Our experimental results indicate that CSR-SIMD has better storage efficiency and low-overhead for format conversion. Besides, the new format achieves high scalability on wide SIMD units. In comparison with the CSR-based and BCSR-based SpMV, CSR-SIMD obtains better performance on FT1500A, Intel Xeon, and Intel Xeon Phi.

引用

页数：10

共 50 条

[1] An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication
Karakasis, Vasileios
Gkountouvas, Theodoros
Kourtis, Kornilios
Goumas, Georgios
Koziris, Nectarios
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (10) : 1930 - 1940
[2] CSR&RV: An Efficient Value Compression Format for Sparse Matrix-Vector Multiplication
Yan, Junjun
Chen, Xinhai
Liu, Jie
NETWORK AND PARALLEL COMPUTING, NPC 2022, 2022, 13615 : 54 - 60
[3] A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS
Kreutzer, Moritz
Hager, Georg
Wellein, Gerhard
Fehske, Holger
Bishop, Alan R.
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2014, 36 (05): : C401 - C423
[4] VBSF: a new storage format for SIMD sparse matrix-vector multiplication on modern processors
Li, Yishui
Xie, Peizhen
Chen, Xinhai
Liu, Jie
Yang, Bo
Li, Shengguo
Gong, Chunye
Gan, Xinbiao
Xu, Han
JOURNAL OF SUPERCOMPUTING, 2020, 76 (03): : 2063 - 2081
[5] TaiChi: A Hybrid Compression Format for Binary Sparse Matrix-Vector Multiplication on GPU
Gao, Jianhua
Ji, Weixing
Tan, Zhaonian
Wang, Yizhuo
Shi, Feng
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3732 - 3745
[6] Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format
Greathouse, Joseph L.
Daga, Mayank
SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 769 - 780
[7] Adaptive Hybrid Storage Format for Sparse Matrix-Vector Multiplication on Multi-Core SIMD CPUs
Chen, Shizhao
Fang, Jianbin
Xu, Chuanfu
Wang, Zheng
APPLIED SCIENCES-BASEL, 2022, 12 (19):
[8] Structured sparse matrix-vector multiplication on massively parallel SIMD architectures
Dehn, T
Eiermann, M
Giebermann, K
Sperling, V
PARALLEL COMPUTING, 1995, 21 (12) : 1867 - 1894
[9] Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors
Zhang, Kai
Chen, Shuming
Wang, Yaohua
Wan, Jianghua
IEICE ELECTRONICS EXPRESS, 2013, 10 (09):
[10] SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision
Hishinuma, Toshiaki
Hasegawa, Hidehiko
Tanaka, Teruo
HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 21 - 34

← 1 2 3 4 5 →