Hardware acceleration of YOLOv7-tiny using high-level synthesis tools

被引：0

作者：

Adib Hosseiny

Hadi Jahanirad

机构：

[1] University of Kurdistan,Department of Electronics and Communication Engineering

来源：

Journal of Real-Time Image Processing | 2023年 / 20卷

关键词：

High level synthesis; Convolutional neural network; Object detection; FPGA; YOLO;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

FPGAs have emerged as a promising platform for implementing neural networks due to their reconfigurability, parallelism, and low power consumption. Nonetheless, designing and optimizing FPGA-based neural network accelerators is a complex and time-consuming task with register transfer level (RTL) languages. High-level synthesis (HLS) tools provide a higher level of abstraction for FPGA design, enabling designers to concentrate on top-level design aspects, such as algorithms, rather than low-level hardware implementation details. One of the state-of-the-art object detection networks is you look only once (YOLO) network series which is constructed using different neural network technologies using cross-stage connections and feature extraction techniques like pyramid networks. In this paper, we propose a method for the implementation of YOLOv7-tiny network on FPGAs using HLS tools. We present a comprehensive analysis of the performance and resource utilization of FPGA-based neural network accelerators. Our methods show excellent results for real-time application requirements such as latency. Specifically, our work reduces the usage of digital signal processing (DSP) units by 90% and it saves up to 60% of flip-flops compared to state-of-the-art designs, while achieving competitive usage of block RAM and look-up tables. Additionally, the achieved design latency of 15 ms is extremely suitable for real-time applications. Also we will propose a method for BRAM utilization method and off-chip memory access.

引用

共 31 条

[1]

He K(2015)Spatial pyramid pooling in deep convolutional networks for visual recognition IEEE Trans. Pattern Anal. Mach. Intell. 37 1904-1916

[2]

Zhang X(2018)FPGA-based accelerators of deep learning networks for learning and classification: a review IEEE Access 7 7823-7859

[3]

Ren S(2019)Survey on hardware implementations of visual object trackers IET Image Proc. 13 863-876

[4]

Sun J(2017)Efficient hardware architectures for the deep convolutional neural network IEEE Trans. Circuits Syst. I Regul. Pap. 65 1941-1953

[5]

Shawahna A(2021)A full-featured configurable accelerator for object detection with YOLO IEEE Access 9 75864-75877

[6]

Sait SM(2019)A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27 1861-1873

[7]

El-Maleh A(2022)FPGA-based accelerator for object detection: a comprehensive survey J. Supercomput. 78 14096-14136

[8]

El-Shafie AHA(2022)Hardware acceleration for object detection using YOLOv4 algorithm on Xilinx Zynq platform J. Real-Time Image Proc. 19 931-940

[9]

Habib SE(undefined)undefined undefined undefined undefined-undefined

[10]

Wang J(undefined)undefined undefined undefined undefined-undefined

← 1 2 3 4 →