Object Detection of Remote Sensing Image Based on Multi-Scale Feature Fusion and Attention Mechanism

被引：6

作者：

Du, Zuoqiang ^{[1
]}

Liang, Yuan ^{[2
]}

机构：

[1] Harbin Univ Commerce, Sch Comp & Informat Engn, Harbin 150028, Peoples R China

[2] Jinan Inspur Data Technol Co Ltd, Jinan 250000, Peoples R China

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Deep Learning; object detection; remote sensing image; multi-scale feature fusion pyramid network; adaptive channel spatial attention mechanism; joint teacher knowledge distillation; CLASSIFICATION;

D O I：

10.1109/ACCESS.2024.3352601

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In view of the small size and dense distribution of remote sensing image targets, this paper adds a detection head P2 specifically for small-scale targets on the basis of the three detection layers of the original YOLOv5 model, and involves the shallow high-resolution feature map in the subsequent multi-scale feature fusion module. The problem of losing the key feature information of the small-scale target in the process of multiple downsampling is effectively avoided. Firstly, an enhanced multi-scale feature fusion pyramid network DSI-FPN is designed. The FPN+PAN network is optimized by using DepthwiseSparable Convolution and Involution operators with fewer parameters and computations, as well as a spatial attention mechanism to generate feature graphs with richer information for network detection tasks. Secondly, we propose an adaptive channel spatial attention mechanism SCBAM, which introduces a self-attention mechanism into CBAM module to add non-local information to the interaction that originally had only local information, breaks the convolution kernel limit, expands the model receptive field, and improves the feature expression ability of the model. Thirdly, in order to solve the problem of insufficient computing power when deploying the target detector for equipment, we propose a network knowledge distillation framework for joint teachers based on the feature layer. The distillation loss of teacher is designed, and the trend of student online learning is adjusted dynamically by balancing the contributions of teacher network and truth value. The detection accuracy of the student network is obviously improved, and the parameters and model size of the network are effectively reduced. Finally, Comparing with other remote sensing image object detection methods, the experimental results show that the approach presented has better detection effect for small-scale targets of remote sensing images under different lighting conditions. The detection accuracy reached 43.9%, and 7.4% higher than that of the original model. After knowledge distillation, the model parameters are reduced to 1/3 of the original, and the detection accuracy is 40.2%.

引用

页码：8619 / 8632

页数：14

共 45 条

[1] Architecture, Classification, and Applications of Contemporary Unmanned Aerial Vehicles
Alghamdi, Yousef
Munir, Arslan
La, Hung Manh
[J]. IEEE CONSUMER ELECTRONICS MAGAZINE, 2021, 10 (06) : 9 - 20
[2] Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, 10.48550/arXiv.2004.10934]
[3] A tutorial on Support Vector Machines for pattern recognition
Burges, CJC
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) : 121 - 167
[4] T-YOLO: Tiny Vehicle Detection Based on YOLO and Multi-Scale Convolutional Neural Networks
Carrasco, Daniel Padilla
Rashwan, Hatem A.
Garcia, Miguel Angel
Puig, Domenec
[J]. IEEE ACCESS, 2023, 11 : 22430 - 22440
[5] Hybrid Task Cascade for Instance Segmentation
Chen, Kai
Pang, Jiangmiao
Wang, Jiaqi
Xiong, Yu
Li, Xiaoxiao
Sun, Shuyang
Feng, Wansen
Liu, Ziwei
Shi, Jianping
Ouyang, Wanli
Loy, Chen Change
Lin, Dahua
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4969 - 4978
[6] Chen LC, 2017, Arxiv, DOI [arXiv:1706.05587, DOI 10.48550/ARXIV.1706.05587]
[7] MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features
Chen, Liang-Chieh
Hermans, Alexander
Papandreou, George
Schroff, Florian
Wang, Peng
Adam, Hartwig
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4013 - 4022
[8] Histograms of oriented gradients for human detection
Dalal, N
Triggs, B
[J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
[9] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10] The PASCAL Visual Object Classes Challenge: A Retrospective
Everingham, Mark
Eslami, S. M. Ali
Van Gool, Luc
Williams, Christopher K. I.
Winn, John
Zisserman, Andrew
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) : 98 - 136

← 1 2 3 4 5 →