HFSI-TF: Hierarchical Full-Scale Interactive Transformer Model for Object Detection in Remote Sensing Image

被引:1
作者
Li, Daxiang [1 ,2 ]
Li, Bingying [1 ]
Liu, Ying [1 ,2 ]
机构
[1] Xian Univ Posts & Telecommun, Sch Telecommun & Informat Engn, Xian 710121, Peoples R China
[2] Xian Key Lab Image Proc Technol & Applicat Publ Se, Xian 710121, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Transformers; Iterative decoding; Encoding; Accuracy; Feature extraction; Decoding; Computer architecture; Semantics; Computational modeling; Hierarchical full-scale interactive (HFSI); mixed cross attention (MCA); object detection; remote sensing image (RSI);
D O I
10.1109/LGRS.2024.3482693
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Transformer-based object detection models usually adopt an encoding-decoding architecture that mainly combines self-attention (SA) and multilayer perceptron (MLP). Although this architecture does not require nonmaximum suppression (NMS) and can really achieve end-to-end object detection, it also suffers from the disadvantage of insufficient multiscale object perception in the image, which leads to low accuracy in detecting small objects. Focusing on these issues, a new full-scale bidirectional interactive attention (FSBDIA) mechanism is constructed, thereby a novel hierarchical full-scale interactive transformer (HFSI-TF) model is designed for object detection in remote sensing image (RSI). First, in order to enhance the multiscale perception ability of the model, the FSBDIA mechanism is designed under the guidance of full-scale information. Then, based on FSBDIA, a hierarchical HFSI-TF encoder is constructed to interactively fuse multilayer feature maps layer by layer, thereby obtaining multiscale encoded features of RSI. Finally, a mixed cross attention (MCA) mechanism is also constructed, and an iterative decoding architecture is designed based on it to improve the accuracy of small object detection. Comparative experiments based on two benchmark datasets (i.e., DIOR and HRSC2016) show that the designed HFSI-TF model can effectively improve the accuracy of object detection in RSI, and the model we designed has superior performance compared to other state-of-the-art methods.
引用
收藏
页数:5
相关论文
共 22 条
[1]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[2]   Anchor-Free Oriented Proposal Generator for Object Detection [J].
Cheng, Gong ;
Wang, Jiabao ;
Li, Ke ;
Xie, Xingxing ;
Lang, Chunbo ;
Yao, Yanqing ;
Han, Junwei .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[3]   MixFormer: End-to-End Tracking With Iterative Mixed Attention [J].
Cui, Yutao ;
Jiang, Cheng ;
Wu, Gangshan ;
Wang, Limin .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (06) :4129-4146
[4]   ACE: Anchor-Free Corner Evolution for Real-Time Arbitrarily-Oriented Object Detection [J].
Dai, Pengwen ;
Yao, Siyuan ;
Li, Zekun ;
Zhang, Sanyi ;
Cao, Xiaochun .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :4076-4089
[5]   Attention-Free Global Multiscale Fusion Network for Remote Sensing Object Detection [J].
Gao, Tao ;
Li, Ziqi ;
Wen, Yuanbo ;
Chen, Ting ;
Niu, Qianqian ;
Liu, Zixiang .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 :1-14
[6]  
Ge Z, 2021, Arxiv, DOI arXiv:2107.08430
[7]   Capsule-inferenced Object Detection for Remote Sensing Images [J].
Han, Yingchao ;
Meng, Weixiao ;
Tang, Wei .
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 :5260-5270
[8]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[9]   EMO2-DETR: Efficient-Matching Oriented Object Detection With Transformers [J].
Hu, Zibo ;
Gao, Kun ;
Zhang, Xiaodian ;
Wang, Junwei ;
Wang, Hong ;
Yang, Zhijia ;
Li, Chenrui ;
Li, Wei .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[10]   DN-DETR: Accelerate DETR Training by Introducing Query DeNoising [J].
Li, Feng ;
Zhang, Hao ;
Liu, Shilong ;
Guo, Jian ;
Ni, Lionel M. ;
Zhang, Lei .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13609-13617