A Real-Time FPGA Accelerator Based on Winograd Algorithm for Underwater Object Detection

被引:11
作者
Cai, Liangwei [1 ]
Wang, Ceng [1 ]
Xu, Yuan [2 ]
机构
[1] Shenzhen Univ, Coll Elect & Informat Engn, Shenzhen 518000, Peoples R China
[2] Shenzhen Technol Univ, Coll Big Data & Internet, Shenzhen 518000, Peoples R China
关键词
underwater object detection; U-Net; MobileNetV3; Winograd algorithm; DEEP NEURAL-NETWORKS; CNN ACCELERATOR; CONVOLUTION;
D O I
10.3390/electronics10232889
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Real-time object detection is a challenging but crucial task for autonomous underwater vehicles because of the complex underwater imaging environment. Resulted by suspended particles scattering and wavelength-dependent light attenuation, underwater images are always hazy and color-distorted. To overcome the difficulties caused by these problems to underwater object detection, an end-to-end CNN network combined U-Net and MobileNetV3-SSDLite is proposed. Furthermore, the FPGA implementation of various convolution in the proposed network is optimized based on the Winograd algorithm. An efficient upsampling engine is presented, and the FPGA implementation of squeeze-and-excitation module in MobileNetV3 is optimized. The accelerator is implemented on a Zynq XC7Z045 device running at 150 MHz and achieves 23.68 frames per second (fps) and 33.14 fps when using MobileNetV3-Large and MobileNetV3-Small as the feature extractor. Compared to CPU, our accelerator achieves 7.5x-8.7x speedup and 52x-60x energy efficiency.
引用
收藏
页数:15
相关论文
共 50 条
[1]   Accelerating Convolutional Neural Network With FFT on Embedded Hardware [J].
Abtahi, Tahmid ;
Shea, Colin ;
Kulkarni, Amey ;
Mohsenin, Tinoosh .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (09) :1737-1749
[2]  
Ancuti C, 2012, PROC CVPR IEEE, P81, DOI 10.1109/CVPR.2012.6247661
[3]   A CNN Accelerator on FPGA Using Depthwise Separable Convolution [J].
Bai, Lin ;
Zhao, Yiming ;
Huang, Xinming .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2018, 65 (10) :1415-1419
[4]  
Burguera A, 2016, IEEE INT C EMERG
[5]   Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices [J].
Chen, Yu-Hsin ;
Yange, Tien-Ju ;
Emer, Joel S. ;
Sze, Vivienne .
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (02) :292-308
[6]   Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Krishna, Tushar ;
Emer, Joel S. ;
Sze, Vivienne .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) :127-138
[7]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807
[8]  
Dai J., 2016, OCEANS 2016 SHANGHAI, P1, DOI [DOI 10.1109/OCEANSAP.2016.7485680, 10.1109/OCEANSAP. 2016.7485680]
[9]  
Desoli G, 2017, ISSCC DIG TECH PAP I, P238, DOI 10.1109/ISSCC.2017.7870349
[10]  
DiCecco R, 2016, 2016 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), P265, DOI 10.1109/FPT.2016.7929549