Transformer-Based Object Detection with Deep Feature Fusion Using Carafe Operator (TRCNet) in Remote Sensing Image

被引:0
作者
Chen S. [1 ]
Wang B. [2 ]
Zhong C. [1 ]
机构
[1] Hangzhou Dianzi University, Hangzhou
[2] College of Science, Beijing Forestry University, Beijing
关键词
Remote sensing image; target detection; transformer;
D O I
10.4108/ew.3404
中图分类号
学科分类号
摘要
Recently, broad applications can be found in optical remote sensing images (ORSI), such as in urban planning, military mapping, field survey, and so on. Target detection is one of its important applications. In the past few years, with the wings of deep learning, the target detection algorithm based on CNN has harvested a breakthrough. However, due to the different directions and target sizes in ORSI, it will lead to poor performance if the target detection algorithm for ordinary optical images is directly applied. Therefore, how to improve the performance of the object detection model on ORSI is thorny. Aiming at solving the above problems, premised on the one-stage target detection model-RetinaNet, this paper proposes a new network structure with more efficiency and accuracy, that is, a Transformer-Based Network with Deep Feature Fusion Using Carafe Operator (TRCNet). Firstly, a PVT2 structure based on the transformer is adopted in the backbone and we apply a multi-head attention mechanism to obtain global information in optical images with complex backgrounds. Meanwhile, the depth is increased to better extract features. Secondly, we introduce the carafe operator into the FPN structure of the neck to integrate the high-level semantics with the low-level ones more efficiently to further improve its target detection performance. Experiments on our well-known public NWPU-VHR-10 and RSOD show that mAP increases by 8.4% and 1.7% respectively. Comparison with other advanced networks also witnesses that our proposed network is effective and advanced. © 2023 Copyright © 2023 Chen et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.
引用
收藏
页码:1 / 11
页数:10
相关论文
共 39 条
[1]  
Niemeyer J., Rottensteiner F., Soergel U., Contextual classification of lidar data and building object detection in urban areas, ISPRS J. Photogrammetry Remote Sens, 87, pp. 152-165, (2014)
[2]  
Vakalopoulou M., Karantzalos K., Komodakis N., Paragios N., Building detection in very high-resolution multispectral data with deep learning features, Proc. IEEE Int. Geosci. Remote Sens. Symp, pp. 1873-1876, (2015)
[3]  
Chen Z., Zhang T., Ouyang C., End-to-end airplane detection using transfer learning in remote sensing images, Remote Sens, 10, (2018)
[4]  
Dai J., Li Y., He K., Sun J., R-FCN: Object detection via region-based fully convolutional networks, Proc. Conf. Adv. Neural Inform. Process. Syst, pp. 379-387, (2016)
[5]  
Redmon J., Divvala S., Girshick R., Farhadi A., You only look once: Unifified, real-time object detection, Proc. IEEE Conf. Comput. Vis. Pattern Recognit, pp. 779-788, (2016)
[6]  
Zhang Q., Et al., Dense attention flfluid network for salient object detection in optical remote sensing images, IEEE Trans. Image Process, 30, pp. 1305-1317, (2021)
[7]  
Zhou X., Et al., Edge-Guided Recurrent Positioning Network for Salient Object Detection in Optical Remote Sensing Images, IEEE. Transactions on Cybernetics
[8]  
Wang J., Chen K., Xu R., Liu Z., Loy C. C., Lin D., CARAFE: Content-Aware Reassembly of Features, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3007-3016, (2019)
[9]  
Han Xiaobing, Zhong Yanfei, Zhang Liangpei, An efficient and robust integrated geospatial object detection framework for high spatial resolution remote sensing imagery, Remote Sensing, 9, 7, (2017)
[10]  
Dai Jifeng, Qi Haozhi, Xiong Yuwen, Li Yi, Zhang Guodong, Hu Han, Wei Yichen, Deformable convolutional networks, CoRR, 1, 2, (2017)