Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks

被引:29
|
作者
Wang, Xiaowei [1 ]
Yu, Jiecao [1 ]
Augustine, Charles [2 ]
Iyer, Ravi [2 ]
Das, Reetuparna [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Intel Corp, Santa Clara, CA 95051 USA
基金
美国国家科学基金会;
关键词
In-Memory Computing; Cache; Neural Network Pruning; Low Precision Neural Network;
D O I
10.1109/HPCA.2019.00029
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks - an in-SRAM architecture for accelerating Convolutional Neural Network (CNN) inference by leveraging network redundancy and massive parallelism. The network redundancy is exploited in two ways. First, we prune and fine-tune the trained network model and develop two distinct methods - coalescing and overlapping to run inferences efficiently with sparse models. Second, we propose an architecture for network models with a reduced bit width by leveraging bit-serial computation. Our proposed architecture achieves a 17.7x/3.7x speedup over server class CPU/GPU, and a 1.6x speedup compared to the relevant in-cache accelerator, with 2% area overhead each processor die, and no loss on top-1 accuracy for AlexNet. With a relaxed accuracy limit, our tunable architecture achieves higher speedups.
引用
收藏
页码:81 / 93
页数:13
相关论文
共 50 条
  • [31] Fpar: filter pruning via attention and rank enhancement for deep convolutional neural networks acceleration
    Chen, Yanming
    Wu, Gang
    Shuai, Mingrui
    Lou, Shubin
    Zhang, Yiwen
    An, Zhulin
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (07) : 2973 - 2985
  • [32] Bit Efficient Quantization for Deep Neural Networks
    Nayak, Prateeth
    Zhang, David
    Chai, Sek
    FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 52 - 56
  • [33] MXQN:Mixed quantization for reducing bit-width of weights and activations in deep convolutional neural networks
    Huang, Chenglong
    Liu, Puguang
    Fang, Liang
    APPLIED INTELLIGENCE, 2021, 51 (07) : 4561 - 4574
  • [34] MXQN:Mixed quantization for reducing bit-width of weights and activations in deep convolutional neural networks
    Chenglong Huang
    Puguang Liu
    Liang Fang
    Applied Intelligence, 2021, 51 : 4561 - 4574
  • [35] Adversarial Robustness of Multi-bit Convolutional Neural Networks
    Frickenstein, Lukas
    Sampath, Shambhavi Balamuthu
    Mori, Pierpaolo
    Vemparala, Manoj-Rohit
    Fasfous, Nael
    Frickenstein, Alexander
    Unger, Christian
    Passerone, Claudio
    Stechele, Walter
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 3, INTELLISYS 2023, 2024, 824 : 157 - 174
  • [36] Plug and Play Deep Convolutional Neural Networks
    Neary, Patrick
    Allan, Vicki
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 388 - 395
  • [37] An Efficient Accelerator for Deep Convolutional Neural Networks
    Kuo, Yi-Xian
    Lai, Yeong-Kang
    2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,
  • [38] Elastography mapped by deep convolutional neural networks
    LIU DongXu
    KRUGGEL Frithjof
    SUN LiZhi
    Science China(Technological Sciences), 2021, (07) : 1567 - 1574
  • [39] Predicting enhancers with deep convolutional neural networks
    Min, Xu
    Zeng, Wanwen
    Chen, Shengquan
    Chen, Ning
    Chen, Ting
    Jiang, Rui
    BMC BIOINFORMATICS, 2017, 18
  • [40] Metaphase finding with deep convolutional neural networks
    Moazzen, Yaser
    Capar, Abdulkerim
    Albayrak, Abdulkadir
    Calik, Nurullah
    Toreyin, Behcet Ugur
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2019, 52 : 353 - 361