OrientedFormer: An End-to-End Transformer-Based Oriented Object Detector in Remote Sensing Images

被引：14

作者：

Zhao, Jiaqi ^{[1
,2
]}

Ding, Zeyu ^{[1
]}

Zhou, Yong ^{[1
]}

Zhu, Hancheng ^{[1
]}

Du, Wen-Liang ^{[1
]}

Yao, Rui ^{[1
]}

El Saddik, Abdulmotaleb ^{[3
]}

机构：

[1] China Univ Min & Technol, Mine Digitizat Engn Res Ctr, Sch Comp Sci & Technol, Minist Educ, Xuzhou 221116, Peoples R China

[2] China Univ Min & Technol, Innovat Res Ctr Disaster Intelligent Prevent & Eme, Xuzhou 221116, Peoples R China

[3] Univ Ottawa, Sch Elect Engn & Comp Sci, Ottawa, ON K1N 6N5, Canada

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

基金：

中国国家自然科学基金;

关键词：

End-to-end detectors; oriented object detection; positional encoding (PE); remote sensing; transformer;

D O I：

10.1109/TGRS.2024.3456240

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Oriented object detection in remote sensing images is a challenging task due to objects being distributed in multiorientation. Recently, end-to-end transformer-based methods have achieved success by eliminating the need for post-processing operators compared to traditional convolutional neural network (CNN)-based methods. However, directly extending transformers to oriented object detection presents three main issues: 1) objects rotate arbitrarily, necessitating the encoding of angles along with position and size; 2) the geometric relations of oriented objects are lacking in self-attention, due to the absence of interaction between content and positional queries; and 3) oriented objects cause misalignment, mainly between values and positional queries in cross-attention, making accurate classification and localization difficult. In this article, we propose an end-to-end transformer-based oriented object detector, consisting of three dedicated modules to address these issues. First, Gaussian positional encoding (PE) is proposed to encode the angle, position, and size of oriented boxes using Gaussian distributions. Second, Wasserstein self-attention is proposed to introduce geometric relations and facilitate interaction between content and positional queries by utilizing Gaussian Wasserstein distance scores. Third, oriented cross-attention is proposed to align values and positional queries by rotating sampling points around the positional query according to their angles. Experiments on six datasets DIOR-R, a series of DOTA, HRSC2016, and ICDAR2015 show the effectiveness of our approach. Compared with previous end-to-end detectors, the OrientedFormer gains 1.16 and 1.21 AP50 on DIOR-R and DOTA-v1.0, respectively, while reducing training epochs from 3 X to 1 X . The code is available at https://github.com/wokaikaixinxin/OrientedFormer.

引用

页数：16

共 61 条

[1] Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields [J].

Barron, Jonathan T. ;

Mildenhall, Ben ;

Tancik, Matthew ;

Hedman, Peter ;

Martin-Brualla, Ricardo ;

Srinivasan, Pratul P. .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :5835-5844

[2]

Biswas Debojyoti, 2022, 2022 14th International Conference on Computational Intelligence and Communication Networks (CICN), P125, DOI 10.1109/CICN56167.2022.10008383

[3]

Biswas D., 2022, P IEEE INT C NETW AR, P1

[4] Unsupervised Domain Adaptation With Debiased Contrastive Learning and Support-Set Guided Pseudolabeling for Remote Sensing Images [J].

Biswas, Debojyoti ;

Tesic, Jelena .

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 :3197-3210

[5] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[6] Hybrid Task Cascade for Instance Segmentation [J].

Chen, Kai ;

Pang, Jiangmiao ;

Wang, Jiaqi ;

Xiong, Yu ;

Li, Xiaoxiao ;

Sun, Shuyang ;

Feng, Wansen ;

Liu, Ziwei ;

Shi, Jianping ;

Ouyang, Wanli ;

Loy, Chen Change ;

Lin, Dahua .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4969-4978

[7] Dynamic Convolution: Attention over Convolution Kernels [J].

Chen, Yinpeng ;

Dai, Xiyang ;

Liu, Mengchen ;

Chen, Dongdong ;

Yuan, Lu ;

Liu, Zicheng .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11027-11036

[8] PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments [J].

Chen, Zhiming ;

Chen, Kean ;

Lin, Weiyao ;

See, John ;

Yu, Hui ;

Ke, Yan ;

Yang, Cong .

COMPUTER VISION - ECCV 2020, PT V, 2020, 12350 :195-211

[9] Anchor-Free Oriented Proposal Generator for Object Detection [J].

Cheng, Gong ;

Wang, Jiabao ;

Li, Ke ;

Xie, Xingxing ;

Lang, Chunbo ;

Yao, Yanqing ;

Han, Junwei .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

[10] Dual-Aligned Oriented Detector [J].

Cheng, Gong ;

Yao, Yanqing ;

Li, Shengyang ;

Li, Ke ;

Xie, Xingxing ;

Wang, Jiabao ;

Yao, Xiwen ;

Han, Junwei .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

← 1 2 3 4 5 6 7 →