Remote sensing image target detection combining multi-scale and attention mechanism

被引:2
|
作者
Zhang Y.-Z. [1 ,2 ]
Guo W. [1 ]
Cai Z.-Q. [3 ]
Li W.-B. [1 ]
机构
[1] School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang
[2] Hebei Key Laboratory of Electromagnetic Environmental Effects and Information Processing, Shijiazhuang Tiedao University, Shijiazhuang
[3] Shanwei Institute of Technology, Shanwei
关键词
attention module; feature fusion; multi-scale feature; non-maximum suppression; remote sensing image; target detection; YOLOv5s algorithm;
D O I
10.3785/j.issn.1008-973X.2022.11.012
中图分类号
学科分类号
摘要
Remote sensing images have deficiencies such as complex backgrounds, significant differences in target scales, and dense distribution, resulting in poor detection of existing algorithms. A remote sensing image object detection algorithm that combined multi-scale and attention mechanisms was proposed. The receptive field of images of different sizes improved the atrous spatial pyramid pooling module. An attention module was proposed to improve the feature extraction ability for target regions of remote sensing images under complex backgrounds by learning the feature map channel information and the spatial location information. A weighted bidirectional feature pyramid network structure was introduced to combine with the backbone network to improve the fusion of multi-level features. A distance-based non-maximum suppression method was used for postprocessing, which improved the problem of easy overlapping of detection frames. Experimental results on DIOR and NWPU VHR-10 datasets showed that the mean average precision (mAP) of the proposed algorithm reached 71.6% and 91.6%, which were 2.9% and 1.5% higher than those of the mainstream YOLOv5s algorithm respectively. The algorithm achieved good detection results for complex remote sensing images. © 2022 Zhejiang University. All rights reserved.
引用
收藏
页码:2215 / 2223
页数:8
相关论文
共 26 条
  • [11] HE W P, HUANG Z, WEI Z F, Et al., TF-YOLO: an improved incremental network for real-time object detection, Applied Sciences, 9, 16, (2019)
  • [12] SHAMSOLMOALI P, CHANUSSOT J, ZAREAPOOR M, Et al., Multi-patch feature pyramid network for weakly supervised object detection in optical remote sensing images [J], IEEE Transactions on Geoscience and Remote Sensing, 60, pp. 1-13, (2021)
  • [13] CHEN L C, PAPANDREOU G, KOKKINOS I, Et al., DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs [J], IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 4, pp. 834-848, (2018)
  • [14] BERTASIUS G, TORRESANI L, YU S X, Et al., Convolutional random walk networks for semantic image segmentation [C], 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 858-866, (2017)
  • [15] CHEN L C, PAPANDREOU G, SCHROFF F, Et al., Rethinking atrous convolution for semantic image segmentation
  • [16] HU J, SHEN L, SUN G., Squeeze-and-excitation network [C], IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132-7141, (2018)
  • [17] WOO S, PARK J, LEE J Y, Et al., Cbam: convolutional block attention module [C], European Conference on Computer Vision, pp. 3-19, (2018)
  • [18] ZHOU Yong, CHEN Si-lin, ZHAO Jia-qi, Et al., Weakly semantic based attention network for interpretable object detection in remote sensing imagery [J], Acta Electronica Sinica, 49, 4, pp. 679-689, (2021)
  • [19] ZHANG Y N, KONG J, QI M, Et al., Object detection based on multiple information fusion net, Applied Sciences, 10, 1, (2020)
  • [20] TAN M X, PANG R M, LE Q V., Efficientdet: scalable and efficient object detection [C], IEEE Conference on Computer Vision and Pattern Recognition, pp. 10778-10787, (2020)