Small object detection model for UAV aerial image based on YOLOv7

被引：7

作者：

Chen, Jinguang ^{[1
]}

Wen, Ronghui ^{[1
]}

Ma, Lili ^{[1
]}

机构：

[1] Xian Polytech Univ, Sch Comp Sci, Shaanxi Key Lab Clothing Intelligence, Xian 710048, Peoples R China

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2024年 / 18卷 / 03期

关键词：

UAV image detection; Small object detection; YOLOv7; Swin transformer; Detection head;

D O I：

10.1007/s11760-023-02941-0

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Unmanned Aerial Vehicle (UAV) aerial image target detection mainly faces the problems of small targets and target occlusion. In order to improve detection accuracy while maintaining efficiency, this work introduces a UAV aerial image small object detection model based on the real-time detector YOLOv7(SOD-YOLOv7). To address the challenge of small object detection, we have designed a module that combines Swin Transformer and convolution to better capture the global context information of small objects in the image. Additionally, we have introduced the Bi-Level Routing Attention (BRA) mechanism to enhance the model's focus on small objects. To improve the model's detection capabilities at multiple scales, we have added detection branches. For the issue of detecting occluded objects, we have incorporated a dynamic detection head with deformable convolution and attention mechanisms to enhance the model's spatial awareness of targets. The experimental results on the VisDrone and CARPK unmanned aerial vehicle image datasets show that the average precision (mAP@0.5) of our model reaches 53.2% and 98.5%, respectively. Compared to the original YOLOv7 method, our model achieves an improvement of 4.3% and 0.3%, demonstrating better performance in detecting small objects. The code will be soon released at https://github.com/Gentle-Hui/SOD-YOLOv7.

引用

页码：2695 / 2707

页数：13

共 42 条

[41] BiFormer: Vision Transformer with Bi-Level Routing Attention [J].

Zhu, Lei ;

Wang, Xinjiang ;

Ke, Zhanghan ;

Zhang, Wayne ;

Lau, Rynson .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :10323-10333

[42] Surface defect detection and classification of steel using an efficient Swin Transformer [J].

Zhu, Wei ;

Zhang, Hui ;

Zhang, Chao ;

Zhu, Xiaoyang ;

Guan, Zhen ;

Jia, Jiale .

ADVANCED ENGINEERING INFORMATICS, 2023, 57

← 1 2 3 4 5 →