FPGA-based Convolutional Neural Network Design and Implementation

被引：0

作者：

Yan, Ruitao ^{[1
]}

Yi, Jianjun ^{[1
]}

He, Jie ^{[1
]}

Zhao, Yifan ^{[1
]}

机构：

[1] East China Univ Sci & Technol, Dept Mech Engn, Shanghai, Peoples R China

来源：

2023 3RD ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE, ACCTCS | 2023年

基金：

国家自然科学基金重大项目;

关键词：

FPGA; convolution neural network; hardware acceleration; yolov5s;

D O I：

10.1109/ACCTCS58815.2023.00058

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep convolutional neural networks have prominent advantages in fields like image identification and natural language processing, but because of their high storage costs and massive computational volumes, they are frequently widely used in servers with GPU acceleration capability. As autonomous driving, aerospace, and other industries evolve, some scenarios have higher requirements for real-time detection than others. Since it is not feasible to search for targets by sending video streams to the server for inference and then returning the results, low-power hardware acceleration options for neural networks must be investigated. In this paper, we suggest an FPGA-based specialized accelerator for convolutional neural networks. To support the parallel execution of each convolution module, we analyze the computational properties of neural networks and design a convolutional computational structure with the deep flow and high parallelism. In addition, each layer of convolution is internally divided into multiple computational units along the channel direction to further enhance the computational parallelism. In this study, we use the Xilinx xc7z100 platform to implement an onboard test for a yolov5s-based neural network. According to the experimental findings, this design structure's acceleration ratio can reach 142 times and its power consumption is only 4.5W, which could provide a significant performance boost at a low power consumption when compared to the 800MHz ARM cortexA9.

引用

页码：456 / 460

页数：5

共 8 条

[1] An OpenCLTM Deep Learning Accelerator on Arria 10 [J].

Aydonat, Utku ;

O'Connell, Shane ;

Capalija, Davor ;

Ling, Andrew C. ;

Chiu, Gordon R. .

FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, :55-64

[2] ShiDianNao: Shifting Vision Processing Closer to the Sensor [J].

Du, Zidong ;

Fasthuber, Robert ;

Chen, Tianshi ;

Ienne, Paolo ;

Li, Ling ;

Luo, Tao ;

Feng, Xiaobing ;

Chen, Yunji ;

Temam, Olivier .

2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, :92-104

[3]

Han S, 2015, ADV NEUR IN, V28

[4] Going Deeper with Embedded FPGA Platform for Convolutional Neural Network [J].

Qiu, Jiantao ;

Wang, Jie ;

Yao, Song ;

Guo, Kaiyuan ;

Li, Boxun ;

Zhou, Erjin ;

Yu, Jincheng ;

Tang, Tianqi ;

Xu, Ningyi ;

Song, Sen ;

Wang, Yu ;

Yang, Huazhong .

PROCEEDINGS OF THE 2016 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'16), 2016, :26-35

[5] XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks [J].

Rastegari, Mohammad ;

Ordonez, Vicente ;

Redmon, Joseph ;

Farhadi, Ali .

COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :525-542

[6]

Simonyan K, 2015, Arxiv, DOI arXiv:1409.1556

[7] Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks [J].

Suda, Naveen ;

Chandra, Vikas ;

Dasika, Ganesh ;

Mohanty, Abinash ;

Ma, Yufei ;

Vrudhula, Sarma ;

Seo, Jae-Sun ;

Cao, Yu .

PROCEEDINGS OF THE 2016 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'16), 2016, :16-25

[8] A Framework for Generating High Throughput CNN Implementations on FPGAs [J].

Zeng, Hanqing ;

Chen, Ren ;

Zhang, Chi ;

Prasanna, Viktor .

PROCEEDINGS OF THE 2018 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'18), 2018, :117-126

← 1 →