Hardware acceleration of YOLOv7-tiny using high-level synthesis tools

被引:0
作者
Adib Hosseiny
Hadi Jahanirad
机构
[1] University of Kurdistan,Department of Electronics and Communication Engineering
来源
Journal of Real-Time Image Processing | 2023年 / 20卷
关键词
High level synthesis; Convolutional neural network; Object detection; FPGA; YOLO;
D O I
暂无
中图分类号
学科分类号
摘要
FPGAs have emerged as a promising platform for implementing neural networks due to their reconfigurability, parallelism, and low power consumption. Nonetheless, designing and optimizing FPGA-based neural network accelerators is a complex and time-consuming task with register transfer level (RTL) languages. High-level synthesis (HLS) tools provide a higher level of abstraction for FPGA design, enabling designers to concentrate on top-level design aspects, such as algorithms, rather than low-level hardware implementation details. One of the state-of-the-art object detection networks is you look only once (YOLO) network series which is constructed using different neural network technologies using cross-stage connections and feature extraction techniques like pyramid networks. In this paper, we propose a method for the implementation of YOLOv7-tiny network on FPGAs using HLS tools. We present a comprehensive analysis of the performance and resource utilization of FPGA-based neural network accelerators. Our methods show excellent results for real-time application requirements such as latency. Specifically, our work reduces the usage of digital signal processing (DSP) units by 90% and it saves up to 60% of flip-flops compared to state-of-the-art designs, while achieving competitive usage of block RAM and look-up tables. Additionally, the achieved design latency of 15 ms is extremely suitable for real-time applications. Also we will propose a method for BRAM utilization method and off-chip memory access.
引用
收藏
相关论文
共 31 条
[1]  
He K(2015)Spatial pyramid pooling in deep convolutional networks for visual recognition IEEE Trans. Pattern Anal. Mach. Intell. 37 1904-1916
[2]  
Zhang X(2018)FPGA-based accelerators of deep learning networks for learning and classification: a review IEEE Access 7 7823-7859
[3]  
Ren S(2019)Survey on hardware implementations of visual object trackers IET Image Proc. 13 863-876
[4]  
Sun J(2017)Efficient hardware architectures for the deep convolutional neural network IEEE Trans. Circuits Syst. I Regul. Pap. 65 1941-1953
[5]  
Shawahna A(2021)A full-featured configurable accelerator for object detection with YOLO IEEE Access 9 75864-75877
[6]  
Sait SM(2019)A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27 1861-1873
[7]  
El-Maleh A(2022)FPGA-based accelerator for object detection: a comprehensive survey J. Supercomput. 78 14096-14136
[8]  
El-Shafie AHA(2022)Hardware acceleration for object detection using YOLOv4 algorithm on Xilinx Zynq platform J. Real-Time Image Proc. 19 931-940
[9]  
Habib SE(undefined)undefined undefined undefined undefined-undefined
[10]  
Wang J(undefined)undefined undefined undefined undefined-undefined