A Mixed-Pruning Based Framework for Embedded Convolutional Neural Network Acceleration

被引:34
作者
Chang, Xuepeng [1 ]
Pan, Huihui [1 ]
Lin, Weiyang [1 ]
Gao, Huijun [1 ]
机构
[1] Harbin Inst Technol, Res Inst Intelligent Control & Syst, Harbin 150001, Peoples R China
基金
中国国家自然科学基金;
关键词
Convolutional neural network; model compression; hardware acceleration; FPGA; EFFICIENT; CNN;
D O I
10.1109/TCSI.2020.3048260
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Convolutional neural networks (CNN) have been proved to be an effective method in the field of artificial intelligence (AI), and large-scale deploying CNN to embedded devices, no doubt, will greatly promote the development and application of AI into the practical industry. However, mainly due to the space-time complexity of CNN, computing power, memory bandwidth and flexibility are performance bottlenecks. In this paper, a framework containing model compression and hardware acceleration is proposed to solve the above problems. This framework consists of a mixed pruning method, data storage optimization for efficient memory utilization and an accelerator for mapping CNN on field programmable gate array (FPGA). The mixed pruning method is used to compress the model, and data bit-width is reduced to 8-bit by data quantization. Accelerator based on FPGA makes it flexible, configurable and efficient for CNN implementation. The model compression is evaluated on NVIDIA RTX2080Ti, and the results illustrate that the VGG16 is compressed by 30x and the fully convolutional network (FCN) is compressed by 11x within 1% accuracy loss. The compressed model is deployed and accelerated on ZCU102, which is up to 1.7x and 24.5x better in energy efficiency compared with RTX2080Ti and Intel i7 7700.
引用
收藏
页码:1706 / 1715
页数:10
相关论文
共 44 条
[1]  
[Anonymous], 2015, ARXIV PREPRINT ARXIV
[2]  
[Anonymous], 2018, ARXIV180806866
[3]  
Buluç A, 2009, SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, P233
[4]   Deep Neural Network Acceleration Based on Low-Rank Approximated Channel Pruning [J].
Chen, Zhen ;
Chen, Zhibo ;
Lin, Jianxin ;
Liu, Sen ;
Li, Weiping .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (04) :1232-1244
[5]   Recursive Binary Neural Network Training Model for Efficient Usage of On-Chip Memory [J].
Guan, Tianchan ;
Liu, Peiye ;
Zeng, Xiaoyang ;
Kim, Martha ;
Seok, Mingoo .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2019, 66 (07) :2593-2605
[6]   Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA [J].
Guo, Kaiyuan ;
Sui, Lingzhi ;
Qiu, Jiantao ;
Yu, Jincheng ;
Wang, Junbin ;
Yao, Song ;
Han, Song ;
Wang, Yu ;
Yang, Huazhong .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (01) :35-47
[7]   EIE: Efficient Inference Engine on Compressed Deep Neural Network [J].
Han, Song ;
Liu, Xingyu ;
Mao, Huizi ;
Pu, Jing ;
Pedram, Ardavan ;
Horowitz, Mark A. ;
Dally, William J. .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :243-254
[8]   Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [J].
He, Yang ;
Liu, Ping ;
Wang, Ziwei ;
Hu, Zhilan ;
Yang, Yi .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4335-4344
[9]   Learning to Prune Filters in Convolutional Neural Networks [J].
Huang, Qiangui ;
Zhou, Kevin ;
You, Suya ;
Neumann, Ulrich .
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :709-718
[10]   Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference [J].
Jacob, Benoit ;
Kligys, Skirmantas ;
Chen, Bo ;
Zhu, Menglong ;
Tang, Matthew ;
Howard, Andrew ;
Adam, Hartwig ;
Kalenichenko, Dmitry .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2704-2713