Hardware Acceleration and Implementation of YOLOX-s for On-Orbit FPGA

被引:3
作者
Wang, Ling [1 ,2 ]
Zhou, Hai [1 ]
Bian, Chunjiang [1 ]
Jiang, Kangning [1 ,2 ]
Cheng, Xiaolei [1 ,2 ]
机构
[1] Chinese Acad Sci, Natl Space Sci Ctr, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
关键词
convolutional neural network; remote sensing image processing; on-orbit high-performance computing; YOLOX-s; FPGA hardware acceleration; CNN;
D O I
10.3390/electronics11213473
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid development of remote sensing technology has brought about a sharp increase in the amount of remote sensing image data. However, due to the satellite's limited hardware resources, space, and power consumption constraints, it is difficult to process massive remote sensing images efficiently and robustly using the traditional remote sensing image processing methods. Additionally, the task of satellite-to-ground target detection has higher requirements for speed and accuracy under the conditions of more and more remote sensing data. To solve these problems, this paper proposes an extremely efficient and reliable acceleration architecture for forward inference of the YOLOX-s detection network an on-orbit FPGA. Considering the limited onboard resources, the design strategy of the parallel loop unrolling of the input channels and output channels is adopted to build the largest DSP computing array to ensure a reliable and full utilization of the limited computing resources, thus reducing the inference delay of the entire network. Meanwhile, a three-path cache queue and a small-scale cascaded pooling array are designed, which maximize the reuse of on-chip cache data, effectively reduce the bandwidth bottleneck of the external memory, and ensure an efficient computing of the entire computing array. The experimental results show that at the 200 MHz operating frequency of the VC709, the overall inference performance of the FPGA acceleration can reach 399.62 GOPS, the peak performance can reach 408.4 GOPS, and the overall computing efficiency of the DSP array can reach 97.56%. Compared with the previous work, our architecture design further improves the computing efficiency under limited hardware resources.
引用
收藏
页数:18
相关论文
共 26 条
[21]   Pipeline ShiftAddNet: An FPGA-Based CNN Implementation With Low Hardware Consumption Targeting Constrained Devices [J].
Kiat, Wei-Pau ;
Lee, Wai Kong ;
Tan, Hung-Khoon ;
Ng, Hui-Fuang .
INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 2025,
[22]   The Implementation of a Power Efficient BCNN-based Object Detection Acceleration on a Xilinx FPGA-SoC [J].
Kim, Heekyung ;
Choi, Ken .
2019 INTERNATIONAL CONFERENCE ON INTERNET OF THINGS (ITHINGS) AND IEEE GREEN COMPUTING AND COMMUNICATIONS (GREENCOM) AND IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING (CPSCOM) AND IEEE SMART DATA (SMARTDATA), 2019, :240-243
[23]   FPGA Hardware Implementation of Efficient Long Short-Term Memory Network Based on Construction Vector Method [J].
Li, Tengfei ;
Gu, Shenshen .
IEEE ACCESS, 2023, 11 :122357-122367
[24]   Resource and Data Optimization for Hardware Implementation of Deep Neural Networks Targeting FPGA-based Edge Devices [J].
Liu, Xinheng ;
Kim, Dae Hee ;
Wu, Chang ;
Chen, Deming .
2018 ACM/IEEE INTERNATIONAL WORKSHOP ON SYSTEM LEVEL INTERCONNECT PREDICTION (SLIP), 2018,
[25]   Aircraft Detection in Satellite Imagery based on Deep Learning/AI Techniques and FPGA /SoC based Hardware Implementation [J].
Garg, Dimple ;
Shah, Dhara ;
Chintapalli, Vaishnavi ;
Paul, Sandip ;
Mishra, Ashish B. .
2024 IEEE SPACE, AEROSPACE AND DEFENCE CONFERENCE, SPACE 2024, 2024, :112-115
[26]   WPU: A FPGA-based Scalable, Efficient and Software/Hardware Co-design Deep Neural Network Inference Acceleration Processor [J].
Xie, Xie ;
Wu, Chang .
2021 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE BIG DATA AND INTELLIGENT SYSTEMS (HPBD&IS), 2021, :1-5