HFSI-TF: Hierarchical Full-Scale Interactive Transformer Model for Object Detection in Remote Sensing Image

被引:1
作者
Li, Daxiang [1 ,2 ]
Li, Bingying [1 ]
Liu, Ying [1 ,2 ]
机构
[1] Xian Univ Posts & Telecommun, Sch Telecommun & Informat Engn, Xian 710121, Peoples R China
[2] Xian Key Lab Image Proc Technol & Applicat Publ Se, Xian 710121, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Transformers; Iterative decoding; Encoding; Accuracy; Feature extraction; Decoding; Computer architecture; Semantics; Computational modeling; Hierarchical full-scale interactive (HFSI); mixed cross attention (MCA); object detection; remote sensing image (RSI);
D O I
10.1109/LGRS.2024.3482693
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Transformer-based object detection models usually adopt an encoding-decoding architecture that mainly combines self-attention (SA) and multilayer perceptron (MLP). Although this architecture does not require nonmaximum suppression (NMS) and can really achieve end-to-end object detection, it also suffers from the disadvantage of insufficient multiscale object perception in the image, which leads to low accuracy in detecting small objects. Focusing on these issues, a new full-scale bidirectional interactive attention (FSBDIA) mechanism is constructed, thereby a novel hierarchical full-scale interactive transformer (HFSI-TF) model is designed for object detection in remote sensing image (RSI). First, in order to enhance the multiscale perception ability of the model, the FSBDIA mechanism is designed under the guidance of full-scale information. Then, based on FSBDIA, a hierarchical HFSI-TF encoder is constructed to interactively fuse multilayer feature maps layer by layer, thereby obtaining multiscale encoded features of RSI. Finally, a mixed cross attention (MCA) mechanism is also constructed, and an iterative decoding architecture is designed based on it to improve the accuracy of small object detection. Comparative experiments based on two benchmark datasets (i.e., DIOR and HRSC2016) show that the designed HFSI-TF model can effectively improve the accuracy of object detection in RSI, and the model we designed has superior performance compared to other state-of-the-art methods.
引用
收藏
页数:5
相关论文
共 22 条
[21]  
Zheng ZH, 2019, Arxiv, DOI [arXiv:1911.08287, 10.1609/aaai.v34i07.6999, DOI 10.1609/AAAI.V34I07.6999, DOI 10.48550/ARXIV.1911.08287]
[22]  
Zhu X., 2021, INT C LEARN REPR, DOI 10.48550/arXiv.2010.04159