Improving Inference Latency and Energy of Network-on-Chip based Convolutional Neural Networks through Weights Compression

被引:6
作者
Ascia, Giuseppe [1 ]
Catania, Vincenzo [1 ]
Jose, John [2 ]
Monteleone, Salvatore [3 ,4 ]
Palesi, Maurizio [1 ]
Patti, Davide [1 ]
机构
[1] Univ Catania, Dept Elect Elect & Comp Engn, Catania, Italy
[2] Indian Inst Technol Guwahati, Dept Comp Sci & Engn, Gauhati, Assam, India
[3] CY Cergy Paris Univ, ENSEA, CNRS, CY Adv Studies, Paris, France
[4] CY Cergy Paris Univ, ENSEA, CNRS, ETIS Lab, Paris, France
来源
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020) | 2020年
关键词
Deep neural network accelerator; weights compression; approximate deep neural network; accuracy/latency/energy trade-off; ACCELERATOR;
D O I
10.1109/IPDPSW50202.2020.00017
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Network-on-Chip (NoC) based Convolutional Neural Network (CNN) accelerators are energy and performance limited by the communication traffic. In fact, to run an inference, the amount of traffic generated both on-chip and off-chip to fetch the parameters of the network, namely, filters and weights, accounts for a large fraction of the energy and latency. This paper presents a technique for compressing the network parameters in such a way to reduce the amount of traffic for fetching the network parameters thus improving the overall performance and energy figures of the accelerator. The lossy nature of the proposed compression technique results in a degradation of the accuracy of the network which we show being, nevertheless, widely justified by the achievable latency and energy consumption improvements. The proposed technique is applied to several widespread CNN models in which the trade-off accuracy vs. inference latency and inference energy is discussed. We show that up to 63% inference latency reduction and 67% inference energy reduction can be achieved with less than 5% top 5 accuracy degradation without the need of retraining the network.
引用
收藏
页码:54 / 63
页数:10
相关论文
共 21 条
[11]  
Hubara I, 2018, J MACH LEARN RES, V18
[12]  
LeCun Y., Mnist handwritten digit database
[13]  
Luo P, 2016, AAAI CONF ARTIF INTE, P3560
[14]   Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0 [J].
Muralimanohar, Naveen ;
Balsubramonian, Rajeev ;
Jouppi, Norm .
MICRO-40: PROCEEDINGS OF THE 40TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2007, :3-+
[15]   A Survey on Deep Learning: Algorithms, Techniques, and Applications [J].
Pouyanfar, Samira ;
Sadiq, Saad ;
Yan, Yilin ;
Tian, Haiman ;
Tao, Yudong ;
Reyes, Maria Presa ;
Shyu, Mei-Ling ;
Chen, Shu-Ching ;
Iyengar, S. S. .
ACM COMPUTING SURVEYS, 2019, 51 (05)
[16]   XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks [J].
Rastegari, Mohammad ;
Ordonez, Vicente ;
Redmon, Joseph ;
Farhadi, Ali .
COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :525-542
[17]   Efficient Processing of Deep Neural Networks: A Tutorial and Survey [J].
Sze, Vivienne ;
Chen, Yu-Hsin ;
Yang, Tien-Ju ;
Emer, Joel S. .
PROCEEDINGS OF THE IEEE, 2017, 105 (12) :2295-2329
[18]  
Ullrich K., 2017, ARXIV E PRINTS
[19]   Tutorial: Approximate Computing [J].
Venkataramani, Swagath ;
Roy, Kaushik ;
Raghunathan, Anand .
2016 29TH INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2016 15TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID), 2016, :3-4
[20]  
Zhai SF, 2016, ADV NEUR IN, V29