A cross-attention integrated shifted window transformer for remote sensing image scene recognition with limited data

被引:0
|
作者
Li, Kaiyuan [1 ]
Xue, Yong [1 ]
Zhao, Jiaqi [2 ]
Li, Honghao [1 ]
Zhang, Sheng [1 ]
机构
[1] China Univ Min & Technol, Sch Environm Sci & Spatial Informat, Xuzhou, Peoples R China
[2] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
recognition; scene classification; insufficient data; remote sensing; deep learning; cross attention; CLASSIFICATION; NETWORK; TREE;
D O I
10.1117/1.JRS.18.036506
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The aim of remote sensing image scene recognition is to label a set of semantic categories based on their contents, and recognition for remote sensing images has a wide range of applications in many fields. However, it is a great challenge to extract category features with insufficiently labeled samples. We propose a Multi-scale Shift-window Cross-attention Vision Transformer (MSC-ViT) framework for remote sensing image scene recognition with limited data. Specifically, the proposed model is composed of three modules: a multi-scale feature extraction module, a shift-window transformer module, and a multi-scale cross-attention module. First, to enhance the efficiency of data utilization, we design a multi-scale module to fully extract the features of object information and spatial information contained in the image. The hierarchical transformer structure based on shifted windows, which are flexible at different scales, could match the computation of multi-scale features. The token fusion method based on the cross-attention mechanism fuses the features between multi-branch tokens and class tokens, which fully learn the information of the tokens and achieve better classification results. In addition, we integrate existing open-source datasets of remote sensing images and form a new dataset to better apply to the scene recognition task of remote sensing images with limited data. Our experimental results show that the proposed method achieves a great performance in scene classification of remote sensing images with limited data. The top-1 accuracy of the developed method is 79.84% with a 20% training ratio, 84.78% with a 40% training ratio, 89.79% with a 60% training ratio, and 91.43% with an 80% training ratio.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] CAMCFormer: Cross-Attention and Multicorrelation Aided Transformer for Few-Shot Object Detection in Optical Remote Sensing Images
    Wang, Lefan
    Mei, Shaohui
    Wang, Yi
    Lian, Jiawei
    Han, Zonghao
    Feng, Yan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [22] VISUAL QUESTION ANSWERING IN REMOTE SENSING WITH CROSS-ATTENTION AND MULTIMODAL INFORMATION BOTTLENECK
    Songara, Jayesh
    Pande, Shivam
    Choudhury, Shabnam
    Banerjee, Biplab
    Velmurugan, Rajbabu
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6278 - 6281
  • [23] Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic Segmentation
    Chen, Yan
    Dong, Quan
    Wang, Xiaofeng
    Zhang, Qianchuan
    Kang, Menglei
    Jiang, Wenxiang
    Wang, Mengyuan
    Xu, Lixiang
    Zhang, Chen
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 4421 - 4435
  • [24] REVIEW OF VISION TRANSFORMER MODELS FOR REMOTE SENSING IMAGE SCENE CLASSIFICATION
    Lv, Pengyuan
    Wu, Wenjun
    Zhong, Yanfei
    Zhang, Liangpei
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 2231 - 2234
  • [25] A Review of Data Augmentation Methods of Remote Sensing Image Target Recognition
    Hao, Xuejie
    Liu, Lu
    Yang, Rongjin
    Yin, Lizeyan
    Zhang, Le
    Li, Xiuhong
    REMOTE SENSING, 2023, 15 (03)
  • [26] Multiscale Dense Cross-Attention Mechanism with Covariance Pooling for Hyperspectral Image Scene Classification
    Liu, Runmin
    Ning, Xin
    Cai, Weiwei
    Li, Guangjun
    MOBILE INFORMATION SYSTEMS, 2021, 2021
  • [27] Cross-Attention-Driven Adaptive Graph Relational Network for Multilabel Remote Sensing Scene Classification
    Bi, Haixia
    Chang, Honghao
    Wang, Xiaotian
    Hong, Danfeng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [28] Equivariant Attention Graph Capsule Network for Remote Sensing Scene Recognition
    Bian, Xiaoyong
    Chen, Xi
    Yu, Guorong
    Du, Qian
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2025, 22
  • [29] Multi-Attention Multi-Image Super-Resolution Transformer (MAST) for Remote Sensing
    Li, Jiaao
    Lv, Qunbo
    Zhang, Wenjian
    Zhu, Baoyu
    Zhang, Guiyu
    Tan, Zheng
    REMOTE SENSING, 2023, 15 (17)
  • [30] Object Detection in Multispectral Remote Sensing Images Based on Cross-Modal Cross-Attention
    Zhao, Pujie
    Ye, Xia
    Du, Ziang
    SENSORS, 2024, 24 (13)