A fast and scalable architecture to run convolutional neural networks in low density FPGAs

被引:18
作者
Vestias, Mario P. [1 ]
Duarte, Rui P. [2 ]
de Sousa, Jose T. [2 ]
Neto, Horacio C. [2 ]
机构
[1] Inst Politecn Lisboa, ISEL, INESC ID, Lisbon, Portugal
[2] Univ Lisbon, Inst Super Tecn, INESC ID, Lisbon, Portugal
关键词
Deep learning; Convolutional neural network; Smart edge devices; FPGA;
D O I
10.1016/j.micpro.2020.103136
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning and, in particular, convolutional neural networks (CNN) achieve very good results on several computer vision applications like security and surveillance, where image and video analysis are required. These networks are quite demanding in terms of computation and memory and therefore are usually implemented in high-performance computing platforms or devices. Running CNNs in embedded platforms or devices with low computational and memory resources requires a careful optimization of system architectures and algorithms to obtain very efficient designs. In this context, Field Programmable Gate Arrays (FPGA) can achieve this efficiency since the programmable hardware fabric can be tailored for each specific network. In this paper, a very efficient configurable architecture for CNN inference targeting any density FPGAs is described. The architecture considers fixed-point arithmetic and image batch to reduce computational, memory and memory bandwidth requirements without compromising network accuracy. The developed architecture supports the execution of large CNNs in any FPGA devices including those with small on-chip memory size and logic resources. With the proposed architecture, it is possible to infer an image in AlexNet in 4.3 ms in a ZYNQ7020 and 1.2 ms in a ZYNQ7045. (c) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 30 条
[1]   Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing [J].
Albericio, Jorge ;
Judd, Patrick ;
Hetherington, Tayler ;
Aamodt, Tor ;
Jerger, Natalie Enright ;
Moshovos, Andreas .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :1-13
[2]  
Alwani M., 2016, P 49 ANN IEEE ACM IN, P1, DOI DOI 10.1109/MICRO.2016.7783725
[3]  
Chakradhar S, 2010, CONF PROC INT SYMP C, P247, DOI 10.1145/1816038.1815993
[4]   DaDianNao: A Machine-Learning Supercomputer [J].
Chen, Yunji ;
Luo, Tao ;
Liu, Shaoli ;
Zhang, Shijin ;
He, Liqiang ;
Wang, Jia ;
Li, Ling ;
Chen, Tianshi ;
Xu, Zhiwei ;
Sun, Ninghui ;
Temam, Olivier .
2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, :609-622
[5]   MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip [J].
Gong, Lei ;
Wang, Chao ;
Li, Xi ;
Chen, Huaping ;
Zhou, Xuehai .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (11) :2601-2612
[6]   Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA [J].
Guo, Kaiyuan ;
Sui, Lingzhi ;
Qiu, Jiantao ;
Yu, Jincheng ;
Wang, Junbin ;
Yao, Song ;
Han, Song ;
Wang, Yu ;
Yang, Huazhong .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (01) :35-47
[7]  
Gysel P., 2016, HARDWARE ORIENTED AP
[8]  
Han S., 2015, NEURIPS, DOI DOI 10.5555/2969239.2969366
[9]  
He K., 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]
[10]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90