Dual-side Sparse Tensor Core

被引:53
作者
Wang, Yang [1 ,2 ]
Zhang, Chen [2 ]
Xie, Zhiqiang [1 ,3 ]
Guo, Cong [1 ,4 ]
Liu, Yunxin [2 ]
Leng, Jingwen [4 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[2] Microsoft Res, Redmond, WA 98052 USA
[3] ShanghaiTech Univ, Shanghai, Peoples R China
[4] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
来源
2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021) | 2021年
基金
中国国家自然科学基金;
关键词
Neural Networks; Graphics Processing Units; General Sparse Matrix-Matrix Multiplication; Convolution; Pruning; PERFORMANCE;
D O I
10.1109/ISCA52012.2021.00088
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Leveraging sparsity in deep neural network (DNN) models is promising for accelerating model inference. Yet existing CPUs can only leverage the sparsity from weights but not activations, which are dynamic, unpredictable, and hence challenging to exploit. In this work, we propose a novel architecture to efficiently harness the dual-side sparsity (i.e., weight and activation sparsity). We lake a systematic approach to understand the (dis)advantages of previous sparsity-related architectures and propose a novel, unexplored paradigm that combines outer-product computation primitive and bitmap-based encoding format. We demonstrate the feasibility of our design with minimal changes to the existing production-scale inner-product-based Tensor Core. We propose a set of novel ISA extensions and co-design the matrix-matrix multiplication and convolution algorithms, which are the two dominant computation patterns in today's DNN models, to exploit our new dual-side sparse Tensor Core. Our evaluation shows that our design can fully unleash the dual-side DNN sparsity and improve the performance by up to one order of magnitude with small hardware overhead.
引用
收藏
页码:1083 / 1095
页数:13
相关论文
共 72 条
  • [1] NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
    Aimar, Alessandro
    Mostafa, Hesham
    Calabrese, Enrico
    Rios-Navarro, Antonio
    Tapiador-Morales, Ricardo
    Lungu, Iulia-Alexandra
    Milde, Moritz B.
    Corradi, Federico
    Linares-Barranco, Alejandro
    Liu, Shih-Chii
    Delbruck, Tobi
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) : 644 - 656
  • [2] Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing
    Albericio, Jorge
    Judd, Patrick
    Hetherington, Tayler
    Aamodt, Tor
    Jerger, Natalie Enright
    Moshovos, Andreas
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 1 - 13
  • [3] [Anonymous], 2017, ARXIV170407724
  • [4] [Anonymous], 2019, ARXIV180308375V2 CSN
  • [5] CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories
    Balasubramonian, Rajeev
    Kahng, Andrew B.
    Muralimanohar, Naveen
    Shafiee, Ali
    Srinivas, Vaishnav
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2017, 14 (02)
  • [6] Bradbury J., 2017, INT C LEARN REPR ICL
  • [7] SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity through Low-Bit Quantization
    Cao, Shijie
    Ma, Lingxiao
    Xiao, Wencong
    Zhang, Chen
    Liu, Yunxin
    Zhang, Lintao
    Nie, Lanshun
    Yang, Zhi
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11208 - 11217
  • [8] Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity
    Cao, Shijie
    Zhang, Chen
    Yao, Zhuliang
    Xiao, Wencong
    Nie, Lanshun
    Zhan, Dechen
    Liu, Yunxin
    Wu, Ming
    Zhang, Lintao
    [J]. PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, : 63 - 72
  • [9] Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007
  • [10] Chetlur S., 2014, ARXIV