Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks

被引:29
|
作者
Wang, Xiaowei [1 ]
Yu, Jiecao [1 ]
Augustine, Charles [2 ]
Iyer, Ravi [2 ]
Das, Reetuparna [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Intel Corp, Santa Clara, CA 95051 USA
基金
美国国家科学基金会;
关键词
In-Memory Computing; Cache; Neural Network Pruning; Low Precision Neural Network;
D O I
10.1109/HPCA.2019.00029
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks - an in-SRAM architecture for accelerating Convolutional Neural Network (CNN) inference by leveraging network redundancy and massive parallelism. The network redundancy is exploited in two ways. First, we prune and fine-tune the trained network model and develop two distinct methods - coalescing and overlapping to run inferences efficiently with sparse models. Second, we propose an architecture for network models with a reduced bit width by leveraging bit-serial computation. Our proposed architecture achieves a 17.7x/3.7x speedup over server class CPU/GPU, and a 1.6x speedup compared to the relevant in-cache accelerator, with 2% area overhead each processor die, and no loss on top-1 accuracy for AlexNet. With a relaxed accuracy limit, our tunable architecture achieves higher speedups.
引用
收藏
页码:81 / 93
页数:13
相关论文
共 50 条
  • [1] Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
    Eckert, Charles
    Wang, Xiaowei
    Wang, Jingcheng
    Subramaniyan, Arun
    Iyer, Ravi
    Sylvester, Dennis
    Blaauw, David
    Das, Reetuparna
    IEEE MICRO, 2019, 39 (03) : 11 - 19
  • [2] Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
    Eckert, Charles
    Wang, Xiaowei
    Wang, Jingcheng
    Subramaniyan, Arun
    Iyer, Ravi
    Sylvester, Dennis
    Blaauw, David
    Das, Reetuparna
    2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, : 383 - 396
  • [3] Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks
    Gong, Cheng
    Lu, Ye
    Xie, Kunpeng
    Jin, Zongming
    Li, Tao
    Wang, Yanzhi
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 3178 - 3193
  • [4] PROCESSING CONVOLUTIONAL NEURAL NETWORKS ON CACHE
    Vieira, Joao
    Roma, Nuno
    Falcao, Gabriel
    Tomas, Pedro
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1658 - 1662
  • [5] Acceleration of Deep Convolutional Neural Networks Using Adaptive Filter Pruning
    Singh, Pravendra
    Verma, Vinay Kumar
    Rai, Piyush
    Namboodiri, Vinay P.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (04) : 838 - 847
  • [6] Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration
    He, Yang
    Ding, Yuhang
    Liu, Ping
    Zhu, Linchao
    Zhang, Hanwang
    Yang, Yi
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2006 - 2015
  • [7] Towards Acceleration of Deep Convolutional Neural Networks using Stochastic Computing
    Li, Ji
    Ren, Ao
    Li, Zhe
    Ding, Caiwen
    Yuan, Bo
    Qiu, Qinru
    Wang, Yanzhi
    2017 22ND ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2017, : 115 - 120
  • [8] Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
    Zhan, Chen
    Fang, Zhenman
    Zhou, Peipei
    Pan, Peichen
    Cong, Jason
    2016 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2016,
  • [9] Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
    Zhang, Chen
    Sun, Guangyu
    Fang, Zhenman
    Zhou, Peipei
    Pan, Peichen
    Cong, Jason
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (11) : 2072 - 2085
  • [10] Filter pruning via annealing decaying for deep convolutional neural networks acceleration
    Huang, Jiawen
    Xiong, Liyan
    Huang, Xiaohui
    Chen, Qingsen
    Huang, Peng
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2025, 28 (02):