FPGA-Based Convolutional Neural Network Accelerator with Resource-Optimized Approximate Multiply-Accumulate Unit

被引:21
作者
Cho, Mannhee [1 ]
Kim, Youngmin [2 ]
机构
[1] Korea Univ, Sch Elect Engn, Seoul 02841, South Korea
[2] Hongik Univ, Sch Elect & Elect Engn, Seoul 04066, South Korea
基金
新加坡国家研究基金会;
关键词
convolutional neural network; FPGA; high-level synthesis; accelerator;
D O I
10.3390/electronics10222859
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional neural networks (CNNs) are widely used in modern applications for their versatility and high classification accuracy. Field-programmable gate arrays (FPGAs) are considered to be suitable platforms for CNNs based on their high performance, rapid development, and reconfigurability. Although many studies have proposed methods for implementing high-performance CNN accelerators on FPGAs using optimized data types and algorithm transformations, accelerators can be optimized further by investigating more efficient uses of FPGA resources. In this paper, we propose an FPGA-based CNN accelerator using multiple approximate accumulation units based on a fixed-point data type. We implemented the LeNet-5 CNN architecture, which performs classification of handwritten digits using the MNIST handwritten digit dataset. The proposed accelerator was implemented, using a high-level synthesis tool on a Xilinx FPGA. The proposed accelerator applies an optimized fixed-point data type and loop parallelization to improve performance. Approximate operation units are implemented using FPGA logic resources instead of high-precision digital signal processing (DSP) blocks, which are inefficient for low-precision data. Our accelerator model achieves 66% less memory usage and approximately 50% reduced network latency, compared to a floating point design and its resource utilization is optimized to use 78% fewer DSP blocks, compared to general fixed-point designs.
引用
收藏
页数:16
相关论文
共 36 条
[1]   A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA [J].
Abdelouahab, Kamel ;
Bourrasset, Cedric ;
Pelcat, Maxime ;
Berry, Francois ;
Quinton, Jean-Charles ;
Serot, Jocelyn .
ICDSC 2016: 10TH INTERNATIONAL CONFERENCE ON DISTRIBUTED SMART CAMERA, 2016, :69-75
[2]   Review of deep learning: concepts, CNN architectures, challenges, applications, future directions [J].
Alzubaidi, Laith ;
Zhang, Jinglan ;
Humaidi, Amjad J. ;
Al-Dujaili, Ayad ;
Duan, Ye ;
Al-Shamma, Omran ;
Santamaria, J. ;
Fadhel, Mohammed A. ;
Al-Amidie, Muthana ;
Farhan, Laith .
JOURNAL OF BIG DATA, 2021, 8 (01)
[3]  
[Anonymous], 2017, REDUCE POWER COST CO
[4]  
Anwar S, 2015, INT CONF ACOUST SPEE, P1131, DOI 10.1109/ICASSP.2015.7178146
[5]  
Chen WJ, 2018, IEEE ASIAN SOLID STA, P51
[6]  
Courbariaux Matthieu, 2014, deep neural networks with low precision multiplications
[7]  
Fergus R., 2013, ICLR
[8]  
Ghaffari S., P 2 INT C SIGN PROC, P1
[9]  
Giardino D., 2019, International Journal of Advanced Science, Engineering and Information Technology, V9, P167, DOI [10.18517/ijaseit.9.1.6948, DOI 10.18517/IJASEIT.9.1.6948]
[10]  
Gschwend D., 2020, ARXIV200506892