Transformer-Based Multi-layer Feature Aggregation and Rotated Anchor Matching for Oriented Object Detection in Remote Sensing Images

被引：1

作者：

Jin, Chuan ^{[1
]}

Zheng, Anqi ^{[1
]}

Wu, Zhaoying ^{[2
]}

Tong, Changqing ^{[1
]}

机构：

[1] Hangzhou Dianzi Univ, Sch Sci, Hangzhou 310018, Zhejiang, Peoples R China

[2] Southeast Univ, Southeast Monash Joint Grad, Suzhou 210096, Jiangsu, Peoples R China

来源：

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING | 2024年 / 49卷 / 09期

关键词：

Remote sensing; Oriented object detection; Transformer; Multi-layer feature aggregation; Rotated anchor matching;

D O I：

10.1007/s13369-024-08892-z

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Object detection has made significant progress in computer vision. However, challenges remain in detecting small, arbitrarily oriented, and densely distributed objects, especially in aerial remote sensing images. This paper presents MATDet, an end-to-end encoder-decoder detection network based on the Transformer designed for oriented object detection. The network employs multi-layer feature aggregation and rotated anchor matching methods to improve oriented small and densely distributed object detection accuracy. Specifically, the encoder is responsible for encoding labeled image blocks using convolutional neural network (CNN) feature maps. It efficiently fuses these blocks with higher resolution multi-scale features through cross-layer connections, facilitating the extraction of global contextual information. The decoder then performs an upsampling of the encoded features, effectively recovering the full spatial resolution of the feature maps to capture essential local-global semantic features for accurate object localization. In addition, high quality proposed anchor boxes are generated by refined convolution, and the convolved features are adaptively aligned according to the anchor boxes to reduce redundant computation. The proposed MATDet achieves mAPs of 80.35%, 78.83%, 73.60%, and 98.01% on the DOTAv1.0, DOTAv1.5, DIOR, and HRSC2016 datasets, respectively, proving that it outperforms the baseline model for oriented object detection. This validation confirms the feasibility and effectiveness of the proposed methods.

引用

页码：12935 / 12951

页数：17

共 75 条

[1] Transformers in Remote Sensing: A Survey [J].

Aleissaee, Abdulaziz Amer ;

Kumar, Amandeep ;

Anwer, Rao Muhammad ;

Khan, Salman ;

Cholakkal, Hisham ;

Xia, Gui-Song ;

Khan, Fahad Shahbaz .

REMOTE SENSING, 2023, 15 (07)

[2] Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery [J].

Azimi, Seyed Majid ;

Vig, Eleonora ;

Bahmanyar, Reza ;

Koerner, Marco ;

Reinartz, Peter .

COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 :150-165

[3] SURF: Speeded up robust features [J].

Bay, Herbert ;

Tuytelaars, Tinne ;

Van Gool, Luc .

COMPUTER VISION - ECCV 2006 , PT 1, PROCEEDINGS, 2006, 3951 :404-417

[4] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[5] Hybrid Task Cascade for Instance Segmentation [J].

Chen, Kai ;

Pang, Jiangmiao ;

Wang, Jiaqi ;

Xiong, Yu ;

Li, Xiaoxiao ;

Sun, Shuyang ;

Feng, Wansen ;

Liu, Ziwei ;

Shi, Jianping ;

Ouyang, Wanli ;

Loy, Chen Change ;

Lin, Dahua .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4969-4978

[6] TEMDnet: A Novel Deep Denoising Network for Transient Electromagnetic Signal With Signal-to-Image Transformation [J].

Chen, Kecheng ;

Pu, Xiaorong ;

Ren, Yazhou ;

Qiu, Hang ;

Lin, Fanqiang ;

Zhang, Saimin .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

[7]

Chen X., 2021, IEEE Geosci. Remote Sens. Lett., V19, P5

[8] What is the relationship between land use and surface water quality? A review and prospects from remote sensing perspective [J].

Cheng, Chunyan ;

Zhang, Fei ;

Shi, Jingchao ;

Kung, Hsiang-Te .

ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2022, 29 (38) :56887-56907

[9] Cross-Scale Feature Fusion for Object Detection in Optical Remote Sensing Images [J].

Cheng, Gong ;

Si, Yongjie ;

Hong, Hailong ;

Yao, Xiwen ;

Guo, Lei .

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (03) :431-435

[10] RIFD-CNN: Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection [J].

Cheng, Gong ;

Zhou, Peicheng ;

Han, Junwei .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2884-2893

← 1 2 3 4 5 6 7 8 →