Dual-side Sparse Tensor Core

被引：53

作者：

Wang, Yang ^{[1
,2
]}

Zhang, Chen ^{[2
]}

Xie, Zhiqiang ^{[1
,3
]}

Guo, Cong ^{[1
,4
]}

Liu, Yunxin ^{[2
]}

Leng, Jingwen ^{[4
]}

机构：

[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China

[2] Microsoft Res, Redmond, WA 98052 USA

[3] ShanghaiTech Univ, Shanghai, Peoples R China

[4] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

来源：

2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

Neural Networks; Graphics Processing Units; General Sparse Matrix-Matrix Multiplication; Convolution; Pruning; PERFORMANCE;

D O I：

10.1109/ISCA52012.2021.00088

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Leveraging sparsity in deep neural network (DNN) models is promising for accelerating model inference. Yet existing CPUs can only leverage the sparsity from weights but not activations, which are dynamic, unpredictable, and hence challenging to exploit. In this work, we propose a novel architecture to efficiently harness the dual-side sparsity (i.e., weight and activation sparsity). We lake a systematic approach to understand the (dis)advantages of previous sparsity-related architectures and propose a novel, unexplored paradigm that combines outer-product computation primitive and bitmap-based encoding format. We demonstrate the feasibility of our design with minimal changes to the existing production-scale inner-product-based Tensor Core. We propose a set of novel ISA extensions and co-design the matrix-matrix multiplication and convolution algorithms, which are the two dominant computation patterns in today's DNN models, to exploit our new dual-side sparse Tensor Core. Our evaluation shows that our design can fully unleash the dual-side DNN sparsity and improve the performance by up to one order of magnitude with small hardware overhead.

引用

页码：1083 / 1095

页数：13

共 72 条

[1] NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
Aimar, Alessandro
Mostafa, Hesham
Calabrese, Enrico
Rios-Navarro, Antonio
Tapiador-Morales, Ricardo
Lungu, Iulia-Alexandra
Milde, Moritz B.
Corradi, Federico
Linares-Barranco, Alejandro
Liu, Shih-Chii
Delbruck, Tobi
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) : 644 - 656
[2] Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing
Albericio, Jorge
Judd, Patrick
Hetherington, Tayler
Aamodt, Tor
Jerger, Natalie Enright
Moshovos, Andreas
[J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 1 - 13
[3] [Anonymous], 2017, ARXIV170407724
[4] [Anonymous], 2019, ARXIV180308375V2 CSN
[5] CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories
Balasubramonian, Rajeev
Kahng, Andrew B.
Muralimanohar, Naveen
Shafiee, Ali
Srinivas, Vaishnav
[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2017, 14 (02)
[6] Bradbury J., 2017, INT C LEARN REPR ICL
[7] SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity through Low-Bit Quantization
Cao, Shijie
Ma, Lingxiao
Xiao, Wencong
Zhang, Chen
Liu, Yunxin
Zhang, Lintao
Nie, Lanshun
Yang, Zhi
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11208 - 11217
[8] Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity
Cao, Shijie
Zhang, Chen
Yao, Zhuliang
Xiao, Wencong
Nie, Lanshun
Zhan, Dechen
Liu, Yunxin
Wu, Ming
Zhang, Lintao
[J]. PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, : 63 - 72
[9] Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007
[10] Chetlur S., 2014, ARXIV

← 1 2 3 4 5 6 7 8 →