Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMV

被引：0

作者：

Hu, Siyi ^{[1
]}

Ito, Makiko ^{[2
]}

Yoshikawa, Takahide ^{[2
]}

He, Yuan ^{[3
]}

Nakamura, Hiroshi ^{[1
]}

Kondo, Masaaki ^{[3
,4
]}

机构：

[1] Univ Tokyo, Tokyo 1138656, Japan

[2] Fujitsu Ltd, Kawasaki 2118588, Japan

[3] Keio Univ, Yokohama 2238522, Japan

[4] RIKEN, Kobe 6500047, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2023年 / E106D卷 / 12期

关键词：

SpMV; memory bandwidth; data compression; mixed-precision;

D O I：

10.1587/transinf.2023PAP0008

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Widely adopted by machine learning and graph processing applications nowadays, sparse matrix-Vector multiplication (SpMV) is a very popular algorithm in linear algebra. This is especially the case for fully-connected MLP layers, which dominate many SpMV computations and play a substantial role in diverse services. As a consequence, a large fraction of data center cycles is spent on SpMV kernels. Meanwhile, despite having efficient storage options against sparsity (such as CSR or CSC), SpMV kernels still suffer from the problem of limited memory band-width during data transferring because of the memory hierarchy of modern computing systems. In more detail, we find that both integer and floating-point data used in SpMV kernels are handled plainly without any necessary pre-processing. Therefore, we believe bandwidth conservation techniques, such as data compression, may dramatically help SpMV kernels when data is transferred between the main memory and the Last Level Cache (LLC). Furthermore, we also observe that convergence conditions in some typi-cal scientific computation benchmarks (based on SpMV kernels) will not be degraded when adopting lower precision floating-point data. Based on these findings, in this work, we propose a simple yet effective data compression scheme that can be extended to general purpose computing architectures or HPC systems preferably. When it is adopted, a best-case speedup of 1.92x is made. Besides, evaluations with both the CG kernel and the PageRank algorithm indicate that our proposal introduces negligible over-head on both the convergence speed and the accuracy of final results.

引用

页码：2015 / 2025

页数：11

共 17 条

[1]

Ahmad K., 2019, ACM T ARCHIT CODE OP, V16, P1

[2]

[Anonymous], FUJITSU Processor A64FX Datasheet

[3] An Efficient Two-Dimensional Blocking Strategy for Sparse Matrix-Vector Multiplication on GPUs [J].

Ashari, Arash ;

Sedaghati, Naser ;

Eisenlohr, John ;

Sadayappan, P. .

PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14), 2014, :273-282

[4] CSR2: A New Format for SIMD-accelerated SpMV [J].

Bian, Haodong ;

Huang, Jianqiang ;

Dong, Runting ;

Liu, Lingbin ;

Wang, Xiaoying .

2020 20TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2020), 2020, :350-359

[5]

David B., 1995, Technical Report NAS-95-020, V156

[6] The University of Florida Sparse Matrix Collection [J].

Davis, Timothy A. ;

Hu, Yifan .

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2011, 38 (01)

[7] High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems [J].

Dongarra, Jack ;

Heroux, Michael A. ;

Luszczek, Piotr .

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2016, 30 (01) :3-10

[8] Accelerating SpMV on FPGAs by Compressing Nonzero Values [J].

Grigoras, Paul ;

Burovskiy, Pavel ;

Hung, Eddie ;

Luk, Wayne .

2015 IEEE 23RD ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2015, :64-67

[9]

HPCG Ranking, 2021, About us

[10] Active Through-Silicon Interposer Based 2.5D IC Design, Fabrication, Assembly and Test [J].

Jayabalan, Jayasanker ;

Chidambaram, Vivek ;

Siang, Sharon Lim Pei ;

Wang Xiangyu ;

Chinq, Jong Ming ;

Bhattacharya, Surya .

2019 IEEE 69TH ELECTRONIC COMPONENTS AND TECHNOLOGY CONFERENCE (ECTC), 2019, :587-593

← 1 2 →