Using Fermat Number Transform to Accelerate Convolutional Neural Network

被引:0
作者
Xu, Weihong [1 ,2 ]
You, Xiaohu [2 ]
Zhang, Chuan [1 ,2 ]
机构
[1] Southeast Univ, Lab Efficient Architectures Digital Commun & Sign, Nanjing, Jiangsu, Peoples R China
[2] Southeast Univ, Natl Mobile Commun Res Lab, Nanjing, Jiangsu, Peoples R China
来源
2017 IEEE 12TH INTERNATIONAL CONFERENCE ON ASIC (ASICON) | 2017年
关键词
Fermat number transform (FNT); Overlap-and-Add (OaA); pipelining; convolutional neural network (CNN);
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Convolutional neural network (CNN), has achieved a significant breakthrough on image recognition and natural language processing. However, CNNs still suffer from the prohibitive computation complexity. Many efforts have been made to reduce the arithmetic complexity of direct convolution Aiming at lower multiplication complexity, an efficient convolution architecture based on Fermat number transform (FNT) is proposed in this paper. We first present the FNT algorithm and Overlap-and-Add (OaA) method. To cope with convolution computation in CNNs, the FNT is extended to two-dimensional (2D) FNT and corresponding calculation methodology is introduced. The pipelined FNT architecture is also proposed for efficient realization of FNT. Then an overall architecture of CNN accelerator is given based on OaA FNT. Complexity analysis has demonstrated that the proposed OaA FNT convolution method reduces 5.59x and 2.56x of multiplication complexity compared to direct convolution and OaA FFT method. We also conduct evaluation on the FPGA platform Xilinx Virtex-7 XC7VX485t and comparison is made with respect to convolution throughput and resource efficiency (GOP/s/DSP). The proposed architecture achieves 1.41x convolution throughput with 3.05x less DSPs compared to the state-of-the-art design. 4.29x improvement is gained in terms of the resource efficiency.
引用
收藏
页码:1033 / 1036
页数:4
相关论文
共 13 条
[1]   FAST CONVOLUTION USING FERMAT NUMBER TRANSFORMS WITH APPLICATIONS TO DIGITAL FILTERING [J].
AGARWAL, RC ;
BURRUS, CS .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1974, SP22 (02) :87-97
[2]   NEW ALGORITHMS FOR DIGITAL CONVOLUTION [J].
AGARWAL, RC ;
COOLEY, JW .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1977, 25 (05) :392-410
[3]  
[Anonymous], 2013, Fast training of convolutional networks through FFTs
[4]   Pipelined Parallel FFT Architectures via Folding Transformation [J].
Ayinala, Manohar ;
Brown, Michael ;
Parhi, Keshab K. .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2012, 20 (06) :1068-1081
[5]   Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Krishna, Tushar ;
Emer, Joel S. ;
Sze, Vivienne .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) :127-138
[6]   EFFECTS OF FINITE REGISTER LENGTH IN DIGITAL FILTERING AND FAST FOURIER-TRANSFORM [J].
OPPENHEIM, AV ;
WEINSTEIN, CJ .
PROCEEDINGS OF THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, 1972, 60 (08) :957-+
[7]   Going Deeper with Embedded FPGA Platform for Convolutional Neural Network [J].
Qiu, Jiantao ;
Wang, Jie ;
Yao, Song ;
Guo, Kaiyuan ;
Li, Boxun ;
Zhou, Erjin ;
Yu, Jincheng ;
Tang, Tianqi ;
Xu, Ningyi ;
Song, Sen ;
Wang, Yu ;
Yang, Huazhong .
PROCEEDINGS OF THE 2016 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'16), 2016, :26-35
[8]  
REED IS, 1977, IEEE T COMPUT, V26, P874, DOI 10.1109/TC.1977.1674935
[9]  
Simonyan K., 2014, 14091556 ARXIV, DOI DOI 10.1016/J.INFSOF.2008.09.005
[10]  
Xu W., 2017, P IEEE INT IN PRESS