Geospatial Transformer Is What You Need for Aircraft Detection in SAR Imagery

被引:35
作者
Chen, Lifu [1 ,2 ]
Luo, Ru [1 ,2 ]
Xing, Jin [3 ]
Li, Zhenhong [4 ]
Yuan, Zhihui [1 ,2 ]
Cai, Xingmin [1 ,2 ]
机构
[1] Changsha Univ Sci & Technol, Sch Elect & Informat Engn, Changsha 410014, Peoples R China
[2] Changsha Univ Sci & Technol, Lab Radar Remote Sensing Applicat, Changsha 410014, Peoples R China
[3] Newcastle Univ, Sch Engn, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
[4] Changan Univ, Coll Geol Engn & Geomat, Xian 710054, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2022年 / 60卷
基金
中国国家自然科学基金;
关键词
Aircraft; Radar polarimetry; Geospatial analysis; Feature extraction; Synthetic aperture radar; Deep learning; Convolution; Aircraft detection; convolutional neural network (CNN); deep learning; geospatial contextual attention; synthetic aperture radar (SAR);
D O I
10.1109/TGRS.2022.3162235
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Although deep learning techniques have achieved noticeable success in aircraft detection, the scale heterogeneity, position difference, complex background interference, and speckle noise keep aircraft detection in large-scale synthetic aperture radar (SAR) images challenging. To solve these problems, we propose the geospatial transformer framework and implement it as a three-step target detection neural network, namely, the image decomposition, the multiscale geospatial contextual attention network (MGCAN), and result recomposition. First, the given large-scale SAR image is decomposed into slices via sliding windows according to the image characteristics of the aircraft. Second, slices are input into the MGCAN network for feature extraction, and the cluster distance nonmaximum suppression (CD-NMS) is utilized to determine the bounding boxes of aircraft. Finally, the detection results are produced via recomposition. Two innovative geospatial attention modules are proposed within MGCAN, namely, the efficient pyramid convolution attention fusion (EPCAF) module and the parallel residual spatial attention (PRSA) module, to extract multiscale features of the aircraft and suppress background noise. In the experiment, four large-scale SAR images with 1-m resolution from the Gaofen-3 system are tested, which are not included in the dataset. The results indicate that the detection performance of our geospatial transformer is better than Faster R-CNN, SSD, Efficientdet-D0, and YOLOV5s. The geospatial transformer integrates deep learning with SAR target characteristics to fully capture the multiscale contextual information and geospatial information of aircraft, effectively reduces complex background interference, and tackles the position difference of targets. It greatly improves the detection performance of aircraft and offers an effective approach to merge SAR domain knowledge with deep learning techniques.
引用
收藏
页数:15
相关论文
共 50 条
[1]  
Bochkovskiy A., 2020, PREPRINT
[2]   YOLACT Real-time Instance Segmentation [J].
Bolya, Daniel ;
Zhou, Chong ;
Xiao, Fanyi ;
Lee, Yong Jae .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9156-9165
[3]   Automatic Extraction of Layover From InSAR Imagery Based on Multilayer Feature Fusion Attention Mechanism [J].
Cai, Xingmin ;
Chen, Lifu ;
Xing, Jin ;
Xing, Xuemin ;
Luo, Ru ;
Tan, Siyu ;
Wang, Jielan .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[4]   Employing deep learning for automatic river bridge detection from SAR images based on Adaptively effective feature fusion [J].
Chen, Lifu ;
Weng, Ting ;
Xing, Jin ;
Li, Zhenhong ;
Yuan, Zhihui ;
Pan, Zhouhao ;
Tan, Siyu ;
Luo, Ru .
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2021, 102
[5]   A Multi-Scale Deep Neural Network for Water Detection from SAR Images in the Mountainous Areas [J].
Chen, Lifu ;
Zhang, Peng ;
Xing, Jin ;
Li, Zhenhong ;
Xing, Xuemin ;
Yuan, Zhihui .
REMOTE SENSING, 2020, 12 (19) :1-21
[6]  
Cutrona L.J., 1990, RADAR HDB, V2, P2333
[7]  
Diao WH, 2018, INT GEOSCI REMOTE SE, P2334, DOI 10.1109/IGARSS.2018.8519064
[8]  
Duta I.C., 2020, ARXIV200611538
[9]   Fast R-CNN [J].
Girshick, Ross .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448
[10]   Rich feature hierarchies for accurate object detection and semantic segmentation [J].
Girshick, Ross ;
Donahue, Jeff ;
Darrell, Trevor ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587