Improving Inference Latency and Energy of Network-on-Chip based Convolutional Neural Networks through Weights Compression

被引:6
作者
Ascia, Giuseppe [1 ]
Catania, Vincenzo [1 ]
Jose, John [2 ]
Monteleone, Salvatore [3 ,4 ]
Palesi, Maurizio [1 ]
Patti, Davide [1 ]
机构
[1] Univ Catania, Dept Elect Elect & Comp Engn, Catania, Italy
[2] Indian Inst Technol Guwahati, Dept Comp Sci & Engn, Gauhati, Assam, India
[3] CY Cergy Paris Univ, ENSEA, CNRS, CY Adv Studies, Paris, France
[4] CY Cergy Paris Univ, ENSEA, CNRS, ETIS Lab, Paris, France
来源
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020) | 2020年
关键词
Deep neural network accelerator; weights compression; approximate deep neural network; accuracy/latency/energy trade-off; ACCELERATOR;
D O I
10.1109/IPDPSW50202.2020.00017
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Network-on-Chip (NoC) based Convolutional Neural Network (CNN) accelerators are energy and performance limited by the communication traffic. In fact, to run an inference, the amount of traffic generated both on-chip and off-chip to fetch the parameters of the network, namely, filters and weights, accounts for a large fraction of the energy and latency. This paper presents a technique for compressing the network parameters in such a way to reduce the amount of traffic for fetching the network parameters thus improving the overall performance and energy figures of the accelerator. The lossy nature of the proposed compression technique results in a degradation of the accuracy of the network which we show being, nevertheless, widely justified by the achievable latency and energy consumption improvements. The proposed technique is applied to several widespread CNN models in which the trade-off accuracy vs. inference latency and inference energy is discussed. We show that up to 63% inference latency reduction and 67% inference energy reduction can be achieved with less than 5% top 5 accuracy degradation without the need of retraining the network.
引用
收藏
页码:54 / 63
页数:10
相关论文
共 21 条
[1]  
[Anonymous], 2014, ABS14040736 CORR
[2]   Analyzing Networks-on-Chip based Deep Neural Networks [J].
Ascia, Giuseppe ;
Catania, Vincenzo ;
Monteleone, Salvatore ;
Palesi, Maurizio ;
Patti, Davide ;
Jose, John .
PROCEEDINGS OF THE 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON NETWORKS-ON-CHIP (NOCS'19), 2019,
[3]  
Balan A. K., 2015, ABS150604416 CORR
[4]   Cycle-Accurate Network on Chip Simulation with Noxim [J].
Catania, Vincenzo ;
Mineo, Andrea ;
Monteleone, Salvatore ;
Palesi, Maurizio ;
Patti, Davide .
ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2016, 27 (01)
[5]   DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning [J].
Chen, Tianshi ;
Du, Zidong ;
Sun, Ninghui ;
Wang, Jia ;
Wu, Chengyong ;
Chen, Yunji ;
Temam, Olivier .
ACM SIGPLAN NOTICES, 2014, 49 (04) :269-283
[6]   Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices [J].
Chen, Yu-Hsin ;
Yange, Tien-Ju ;
Emer, Joel S. ;
Sze, Vivienne .
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (02) :292-308
[7]  
Cheng Y., 2017, ARXIV171009282
[8]  
Choi Y., 2016, ABS161201543 CORR
[9]  
Cohen T. S., 2016, ABS160207576 CORR
[10]  
Han S, 2015, ADV NEUR IN, V28