Accelerator-Aware Pruning for Convolutional Neural Networks

被引:71
作者
Kang, Hyeong-Ju [1 ]
机构
[1] Korea Univ Technol & Educ, Sch Comp Sci & Engn, Cheonan 31253, South Korea
基金
新加坡国家研究基金会;
关键词
Accelerator architectures; Field programmable gate arrays; Convolutional codes; Acceleration; Convolutional neural networks; Deep learning; convolutional neural networks; neural network pruning; neural network accelerator;
D O I
10.1109/TCSVT.2019.2911674
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Convolutional neural networks have shown tremendous performance capabilities in computer vision tasks, but their excessive amounts of weight storage and arithmetic operations prevent them from being adopted in embedded environments. One of the solutions involves pruning, where certain unimportant weights are forced to have a value of zero. Many pruning schemes have been proposed, but these have mainly focused on the number of pruned weights, scarcely considering ASIC or FPGA accelerator architectures. When a pruned network is run on an accelerator, the lack of architecture consideration causes some inefficiency problems, including the internal buffer misalignments and load imbalances. This paper proposes a new pruning scheme that reflects accelerator architectures. In the proposed scheme, pruning is performed so that the same number of weights remain for each weight group corresponding to activations fetched simultaneously. In this way, the pruning scheme resolves the inefficiency problems, doubling the accelerator performance. Even with this constraint, the proposed pruning scheme reached a pruning ratio similar to that of the previous unconstrained pruning schemes, not only on AlexNet and VGG16 but also on the state-of-the-art very deep networks, such as ResNet. Furthermore, the proposed scheme demonstrated a comparable pruning ratio on compact networks such as MobileNet and on slimmed networks that were already pruned in a channel-wise manner. In addition to improving the efficiency of previous sparse accelerators, it will be also shown that the proposed pruning scheme can be used to reduce the logic complexity of sparse accelerators.
引用
收藏
页码:2093 / 2103
页数:11
相关论文
共 50 条
[1]   Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing [J].
Albericio, Jorge ;
Judd, Patrick ;
Hetherington, Tayler ;
Aamodt, Tor ;
Jerger, Natalie Enright ;
Moshovos, Andreas .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :1-13
[2]  
[Anonymous], VGG
[3]  
[Anonymous], 2014, Comput. Sci.
[4]  
[Anonymous], C COMP VIS PATT REC
[5]  
[Anonymous], SQUEEZENET
[6]  
[Anonymous], 2017, SIPS
[7]  
[Anonymous], 2016, ADV NEUR INF PROC SY, DOI [DOI 10.2165/00129785-200404040-00005, DOI 10.1145/3065386]
[8]  
[Anonymous], 2015, Advances in neural information processing systems
[9]  
[Anonymous], P ACM EDAC IEEE DES
[10]  
[Anonymous], 2015, PROC CVPR IEEE