UAV-YOLOv5: A Swin-Transformer-Enabled Small Object Detection Model for Long-Range UAV Images

被引:0
作者
Li J. [1 ,2 ]
Xie C. [1 ,2 ]
Wu S. [1 ,2 ]
Ren Y. [1 ,2 ]
机构
[1] Artificial Intelligence Security Innovation Team, Beijing Information Science and Technology University, Beijing
[2] School of Information Management, Beijing Information Science and Technology University, Beijing
关键词
Deep learning; Small object detection; Swin transformer; UAV detection; YOLOv5;
D O I
10.1007/s40745-024-00546-z
中图分类号
学科分类号
摘要
This paper tackle the challenges associated with low recognition accuracy and the detection of occlusions when identifying long-range and diminutive targets (such as UAVs). We introduce a sophisticated detection framework named UAV-YOLOv5, which amalgamates the strengths of Swin Transformer V2 and YOLOv5. Firstly, we introduce Focal-EIOU, a refinement of the K-means algorithm tailored to generate anchor boxes better suited for the current dataset, thereby improving detection performance. Second, the convolutional and pooling layers in the network with step size greater than 1 are replaced to prevent information loss during feature extraction. Then, the Swin Transformer V2 module is introduced in the Neck to improve the accuracy of the model, and the BiFormer module is introduced to improve the ability of the model to acquire global and local feature information at the same time. In addition, BiFPN is introduced to replace the original FPN structure so that the network can acquire richer semantic information and fuse features across scales more effectively. Lastly, a small target detection head is appended to the existing architecture, augmenting the model’s proficiency in detecting smaller targets with heightened precision. Furthermore, various experiments are conducted on the comprehensive dataset to verify the effectiveness of UAV-YOLOv5, achieving an average accuracy of 87%. Compared with YOLOv5, the mAP of UAV-YOLOv5 is improved by 8.5%, which verifies that it has high-precision long-range small-target UAV optoelectronic detection capability. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
引用
收藏
页码:1109 / 1138
页数:29
相关论文
共 48 条
  • [1] Shi Y., Advances in Big Data Analytics Theory, Algorithms and Practices
  • [2] Olson D.L., Shi Y., Shi Y., Introduction to business data mining, (2007)
  • [3] Shi Y., Tian Y., Kou G., Et al., Optimization based data mining: Theory and applications, (2011)
  • [4] Tien J.M., Internet of things, real-time decision making, and artificial intelligence, Annals Data Sci, 4, pp. 149-178, (2017)
  • [5] Gao Y., Et al., Graphnas++: distributed architecture search for graph neural networks, IEEE Trans Knowl Data Eng, (2022)
  • [6] Gao Y., Et al., HGNAS++: efficient architecture search for heterogeneous graph neural networks, IEEE Trans Knowl Data Eng, (2023)
  • [7] Li J., Liu Y., (2021)
  • [8] Li J., Li C., Tian B., Et al., (2020)
  • [9] Chen, (2022)
  • [10] Pan Xiaoying J., Ningxin M., Yuanzhen, Et al., Review of small target detection, Chin J Image Graphics, 28, 9, pp. 2587-2615, (2023)