A Runtime Programmable Accelerator for Convolutional and Multilayer Perceptron Neural Networks on FPGA

被引：2

作者：

Kabir, Ehsan ^{[1
]}

Poudel, Arpan ^{[1
]}

Aklah, Zeyad ^{[2
]}

Huang, Miaoqing ^{[1
]}

Andrews, David ^{[1
]}

机构：

[1] Univ Arkansas, CSCE Dept, Fayetteville, AR 72701 USA

[2] Univ Thi Qar, Dept Comp Sci, Nasiriyah, Iraq

来源：

APPLIED RECONFIGURABLE COMPUTING. ARCHITECTURES, TOOLS, AND APPLICATIONS, ARC 2022 | 2022年 / 13569卷

关键词：

FPGA; Neural network; MLP; CNN; Overlay; Flexible; Programmable; Reconfigurable; Accelerators; Custom hardware; ARCHITECTURE; HARDWARE;

D O I：

10.1007/978-3-031-19983-7_3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep neural networks (DNNs) are prevalent for many applications related to classification, prediction and regression. To perform different applications with better performance and accuracy, an optimized network architecture is required, which can be obtained through experiments and performance evaluation on different network topologies. However, a custom hardware accelerator is not scalable and it lacks the flexibility to switch from one topology to another at run time. In order to support convolutional neural networks (CNN) along with multilayer perceptron neural networks (MLPNN) of different sizes, we present in this paper an accelerator architecture for FPGAs that can be programmed during run time. This combined CNN and MLP accelerator (CNN-MLPA) can run any CNN and MLPNN applications without re-synthesis. Therefore, time spent on synthesis, placement and routing can be saved for executing different applications on the proposed architecture. Run time results show that the CNN-MLPA can be used for network topologies of different sizes without much degradation of performance. We evaluated the resource utilization and execution time on Xilinx Virtex 7 FPGA board for different benchmark datasets to demonstrate that our design is run time programmable, portable and scalable for any FPGA. The accelerator was then optimized to increase the throughput by applying pipelining and concurrency, and reduce resource consumption with fixed-point operations.

引用

页码：32 / 46

页数：15

共 31 条

[1]

Abdelsalam AM, 2018, PROC INT CONF RECON

[2]

Aklah Zeyad, 2015, Applied Reconfigurable Computing. 11th International Symposium, ARC 2015. Proceedings: LNCS 9040, P427, DOI 10.1007/978-3-319-16214-0_39

[3]

Ann L.Y., 2019, Indonesian J. Electr. Eng. Comput. Sci., V14, P949

[4]

Basterretxea K, 2014, CONF DESIGN ARCHIT

[5] ESCA: Event-Based Split-CNN Architecture with Data-Level Parallelism on UltraScale plus FPGA [J].

Bhowmik, Pankaj ;

Pantho, Md Jubaer Hossain ;

Mbongue, Joel Mandebi ;

Bobda, Christophe .

2021 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2021), 2021, :176-180

[6]

Cheung Kit, 2012, Artificial Neural Networks and Machine Learning - ICANN 2012. Proceedings of the 22nd International Conference on Artificial Neural Networks, P113, DOI 10.1007/978-3-642-33269-2_15

[7] FPGA-Based Convolutional Neural Network Accelerator with Resource-Optimized Approximate Multiply-Accumulate Unit [J].

Cho, Mannhee ;

Kim, Youngmin .

ELECTRONICS, 2021, 10 (22)

[8]

Cho M, 2020, 2020 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC)

[9]

Ngo DM, 2021, I C FIELD PROG LOGIC, P69, DOI 10.1109/FPL53798.2021.00020

[10] Block RAM versus distributed RAM implementation of SVM classifier on FPGA [J].

Fazakas, Albert ;

Neag, Marius ;

Festila, Lelia .

2006 INTERNATIONAL CONFERENCE ON APPLIED ELECTRONICS, 2006, :43-+

← 1 2 3 4 →