Design and FPGA implementation of fast convolution algorithm based on 3D-Winograd

被引:0
作者
Lin K. [1 ]
Jiang H. [1 ]
Zhang Y. [1 ]
Cong R. [1 ]
机构
[1] Beijing Key Laboratory of Digital Media, Beihang University, Beijing
来源
Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics | 2021年 / 47卷 / 09期
基金
中国国家自然科学基金;
关键词
Convolution algorithm; Convolutional Neural Network(CNN); Fast algorithm; FPGA; Winograd;
D O I
10.13700/j.bh.1001-5965.2020.0310
中图分类号
学科分类号
摘要
In recent years, Convolutional Neural Networks (CNNs) have been widely adopted by computer vision tasks. Due to the high performance, energy efficiency, and reconfigurability of FPGA, it has been considered as the most promising CNN hardware accelerator. However, the existing FPGA solutions based on the traditional Winograd method are usually limited by FPGA computing power and storage resources, and there is room for improvement in performance of 3D convolution operations. This paper first studied the one-dimensional expansion process of the Winograd algorithm suitable for three-dimensional operations; then, improved the performance of CNN on FPGA by increasing the one-time input feature map and the dimensional size of the convolution block, low-bit quantization weight and input data. The optimization ideas include four parts: the method of using shift instead of partial division, the division of tiles, the expansion of two-dimensional to three-dimensional, and low-bit quantization. Compared with the traditional two-dimensional Winograd algorithm, the number of clock cycles of each convolutional layer of the optimized algorithm is reduced by about 7 times, which is about 7 times less for each convolutional layer than the traditional sliding window convolution algorithm. Through the research, it is proved that the 3D-Winograd algorithm based on one-dimensional expansion can greatly reduce the computational complexity and improve the performance of running CNN on FPGA. © 2021, Editorial Board of JBUAA. All right reserved.
引用
收藏
页码:1900 / 1907
页数:7
相关论文
共 19 条
[1]  
ZHANG X F, WANG J S, ZHU C, Et al., AccDNN: An IP-based DNN generator for FPGAs, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), (2018)
[2]  
GUAN Y J, LIANG H, XU N Y, Et al., FP-DNN: An automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines(FCCM), pp. 152-159, (2017)
[3]  
GEORGE J K, NEJADRIAHI H, SORGER V J., Towards on-chip optical FFTs for convolutional neural networks, 2017 IEEE International Conference on Rebooting Computing(ICRC), pp. 1-4, (2017)
[4]  
ORDONEZ A, ARGUELLO F, HERAS D B., GPU accelerated FFT-based registration of hyperspectral scenes, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10, 11, pp. 4869-4878, (2017)
[5]  
SUITA S, NISHIMURA T, TOKURA H, Et al., Efficient cuDNN-compatible convolution-pooling on the GPU, International Conference on Parallel Processing and Applied Mathematics, pp. 46-58, (2019)
[6]  
ZHANG C, PRASANNA V., Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system, FPGA'17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 35-44, (2017)
[7]  
CONG J, XIAO B J., Minimizing computation in convolutional neural networks, Artificial Neural Networks and Machine Learning-ICANN 2014, pp. 281-290, (2014)
[8]  
SUDA N, CHANDRA V, DASIKA G, Et al., Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks, FPGA'16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 16-25, (2016)
[9]  
ZHANG C, SUN G Y, FANG Z M, Et al., Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 38, 11, pp. 2072-2085, (2019)
[10]  
LAVIN A, GRAY S., Fast algorithms for convolutional neural networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 4013-4021, (2016)