An FPGA-Based Energy-Efficient Reconfigurable Convolutional Neural Network Accelerator for Object Recognition Applications

被引:50
作者
Li, Jixuan [1 ]
Un, Ka-Fai [1 ]
Yu, Wei-Han [1 ]
Mak, Pui-In [1 ]
Martins, Rui P. [1 ]
机构
[1] Univ Macau, Fac Sci & Technol, State Key Lab Analog & Mixed Signal VLSI IME & DE, Macau, Peoples R China
关键词
Frequency modulation; Kernel; Throughput; Parallel processing; Memory management; Field programmable gate arrays; Computational efficiency; Computation efficiency; convolutional neural network (CNN); FPGA; object recognition; reconfigurability; THROUGHPUT; CNN;
D O I
10.1109/TCSII.2021.3095283
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The computational efficiency is the prime concern of a computation-intensive deep convolutional neural network (CNN). In this Brief, we report an FPGA-based computation-efficient reconfigurable CNN accelerator. It innovates in the utilization of a kernel partition technique to substantially reduce the repeated access to the input feature maps and the kernels. As a result, it balances the ability for parallel computing while consuming less system power. Experimental results prove that the proposed CNN accelerator achieves a peak throughput of 220.0 GOP/s with an energy efficiency of 22.9 GOPs/W at 151.4 frames/s for the AlexNet. It is also reconfigurable to process VGG-16 befitting complex object recognition.
引用
收藏
页码:3143 / 3147
页数:5
相关论文
共 22 条
[1]   Optimizing Hardware Accelerated General Matrix-Matrix Multiplication for CNNs on FPGAs [J].
Ahmad, Afzal ;
Pasha, Muhammad Adeel .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2020, 67 (11) :2692-2696
[2]   An Architecture to Accelerate Convolution in Deep Neural Networks [J].
Ardakani, Arash ;
Condo, Carlo ;
Ahmadi, Mehdi ;
Gross, Warren J. .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (04) :1349-1362
[3]   A CNN Accelerator on FPGA Using Depthwise Separable Convolution [J].
Bai, Lin ;
Zhao, Yiming ;
Huang, Xinming .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2018, 65 (10) :1415-1419
[4]   Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Krishna, Tushar ;
Emer, Joel S. ;
Sze, Vivienne .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) :127-138
[5]   A Throughput-Optimized Channel-Oriented Processing Element Array for Convolutional Neural Networks [J].
Chen, Yu-Xian ;
Ruan, Shanq-Jang .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (02) :752-756
[6]   A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection [J].
Duy Thanh Nguyen ;
Tuan Nghia Nguyen ;
Kim, Hyun ;
Lee, Hyuk-Jae .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2019, 27 (08) :1861-1873
[7]   FPGA-Based Implementation of a Real-Time Object Recognition System Using Convolutional Neural Network [J].
Gilan, Ali Azarmi ;
Emad, Mohammad ;
Alizadeh, Bijan .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2020, 67 (04) :755-759
[8]   A Programmable Heterogeneous Microprocessor Based on Bit-Scalable In-Memory Computing [J].
Jia, Hongyang ;
Valavi, Hossein ;
Tang, Yinqi ;
Zhang, Jintao ;
Verma, Naveen .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2020, 55 (09) :2609-2621
[9]   Hardware/Software Co-Exploration of Neural Architectures [J].
Jiang, Weiwen ;
Yang, Lei ;
Sha, Edwin Hsing-Mean ;
Zhuge, Qingfeng ;
Gu, Shouzhen ;
Dasgupta, Sakyasingha ;
Shi, Yiyu ;
Hu, Jingtong .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (12) :4805-4815
[10]   Achieving Super-Linear Speedup across Multi-FPGA for Real-Time DNN Inference [J].
Jiang, Weiwen ;
Sha, Edwin H-M ;
Zhang, Xinyi ;
Yang, Lei ;
Zhuge, Qingfeng ;
Shi, Yiyu ;
Hu, Jingtong .
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2019, 18 (05)