Object Detection in Remote Sensing Images by Fusing Multi-neuron Sparse Features and Hierarchical Depth Features

被引:0
|
作者
Gao P. [1 ]
Cao X. [1 ]
Li K. [1 ]
You X. [1 ]
机构
[1] Institute of Geospatial Information, PLA Strategic Support Force Information Engineering University, Zhengzhou
基金
中国国家自然科学基金;
关键词
atrous convolution; convolutional neural network; hierarchical depth feature; multi-branching structures; multi-scale objectives; receptive field; remote sensing image; sparse feature;
D O I
10.12082/dqxxkx.2023.220708
中图分类号
学科分类号
摘要
Object detection in remote sensing images is of great significance to urban planning, natural resource survey, land surveying, and other fields. The rapid development of deep learning has greatly improved the accuracy of object detection. However, object detection in remote sensing images faced many challenges such as multi-scale, appearance ambiguity, and complicated background. The remote sensing image datasets have a large range of object size variations, e.g., object resolutions range from a dozen to hundreds of pixels. high background complexity, remote sensing images are obtained with full time geographic information; high similarity in the appearance of different classes of targets; and diversity within classes. To address these problems, a deep convolutional network architecture that fuses the Multi-Neuron Sparse feature extraction block (MNB) and Hierarchical Deep Feature Fusion Block (HDFB) is proposed in this paper. The MNB uses multiple convolutional branching structures to simulate multiple synaptic structures of neurons to extract sparsely distributed features, and improves the quality of captured multi-scale target features by acquiring sparse features in a larger receptive field range as the network layers are stacked. The HDFB extracts contextual features of different depths based on null convolution, and then extracts features through a unique multi- receptive field depth feature fusion network, thus realizing the fusion of local features with global features at the feature map level. Experiments are conducted on the large-scale public datasets (DIOR). The results show that: (1) the overall accuracy of the method reaches 72.5%, and the average detection time of a single remote sensing image is 3.8 milliseconds; Our method has better detection accuracy for multi-scale objects with high appearance similarity and complex background than other SOTA methods; (2) The object detection accuracy of multi- scale and appearance ambiguity targets is improved by using MNB. Compared with object detection results with Step-wise branches, the overall accuracy is improved by 5.8%, and the sum operation on the outputs of each branch help achieve better feature fusion; (3) The HDFB extracts the hierarchical features by the hierarchical depth feature fusion module, which provides a new idea to realize the fusion of local features and global features at the feature map level and improves the fusion capability of the network context information; (4) The reconstructed PANet feature fusion network fuses sparse features at different scales with multivariate sparse feature extraction module, which effectively improves the effectiveness of PANet structure in remote sensing image target detection tasks. Many factors influence the final performance of the algorithm. On the one hand, high quality data sets are the basis of higher accuracy, e.g., image quality, target occlusion, and large intra-class variability of targets profoundly affect the training effect of the detector; on the other hand, model parameters settings, such as clustering analysis of the dataset to obtain bounding boxes information to improve the best recall, and the perceptual field range of the class depth feature fusion module, are key to ensuring accuracy. We conclude that using a Multi-Neuron Sparse feature extraction Network can improve feature quality, while a Hierarchical Deep Feature Fusion Block can fuse contextual information and reduce the impact of complex background noise, resulting in better performance in object detection tasks in remote sensing images. © 2023 Journal of Geo-Information Science. All rights reserved.
引用
收藏
页码:638 / 653
页数:15
相关论文
共 27 条
  • [1] Zhou P C, Cheng G, Yao X W, Et al., Machine learning paradigms in high-resolution remote sensing image interpretation[J], National Remote Sensing Bulletin, 25, 1, pp. 182-197, (2021)
  • [2] Girshick R, Donahue J, Darrell T, Et al., Rich feature hierarchies for accurate object detection and semantic segmentation[C], 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, (2014)
  • [3] Girshick R., Fast R- CNN[C], 2015 IEEE International Conference on Computer Vision, pp. 1440-1448, (2015)
  • [4] Ren S Q, He K M, Girshick R, Et al., Faster R-CNN: Towards real-time object detection with region proposal networks[J], IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 6, pp. 1137-1149, (2017)
  • [5] Liu W, Anguelov D, Erhan D, Et al., SSD: Single Shot MultiBox Detector[C], European Conference on Computer Vision, pp. 21-37, (2016)
  • [6] Redmon J, Divvala S, Girshick R, Et al., You only look once: Unified, real- time object detection[C], 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, (2016)
  • [7] Redmon J, Farhadi A., YOLO9000: better, faster, stronger [C], 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517-6525, (2017)
  • [8] Redmon J, Farhadi A., YOLOv3: An incremental improvement, (2018)
  • [9] Bochkovskiy A, Wang C Y, Liao H Y M., YOLOv4: optimal speed and accuracy of object detection, (2020)
  • [10] Li K., Object detection in optical remote sensing images: A survey and a new benchmark[J], ISPRS Journal of Photogrammetry and Remote Sensing, 159, pp. 296-307, (2020)