A cross-attention integrated shifted window transformer for remote sensing image scene recognition with limited data

被引:0
作者
Li, Kaiyuan [1 ]
Xue, Yong [1 ]
Zhao, Jiaqi [2 ]
Li, Honghao [1 ]
Zhang, Sheng [1 ]
机构
[1] China Univ Min & Technol, Sch Environm Sci & Spatial Informat, Xuzhou, Peoples R China
[2] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
recognition; scene classification; insufficient data; remote sensing; deep learning; cross attention; CLASSIFICATION; NETWORK; TREE;
D O I
10.1117/1.JRS.18.036506
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The aim of remote sensing image scene recognition is to label a set of semantic categories based on their contents, and recognition for remote sensing images has a wide range of applications in many fields. However, it is a great challenge to extract category features with insufficiently labeled samples. We propose a Multi-scale Shift-window Cross-attention Vision Transformer (MSC-ViT) framework for remote sensing image scene recognition with limited data. Specifically, the proposed model is composed of three modules: a multi-scale feature extraction module, a shift-window transformer module, and a multi-scale cross-attention module. First, to enhance the efficiency of data utilization, we design a multi-scale module to fully extract the features of object information and spatial information contained in the image. The hierarchical transformer structure based on shifted windows, which are flexible at different scales, could match the computation of multi-scale features. The token fusion method based on the cross-attention mechanism fuses the features between multi-branch tokens and class tokens, which fully learn the information of the tokens and achieve better classification results. In addition, we integrate existing open-source datasets of remote sensing images and form a new dataset to better apply to the scene recognition task of remote sensing images with limited data. Our experimental results show that the proposed method achieves a great performance in scene classification of remote sensing images with limited data. The top-1 accuracy of the developed method is 79.84% with a 20% training ratio, 84.78% with a 40% training ratio, 89.79% with a 60% training ratio, and 91.43% with an 80% training ratio.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Pixel Representation Augmented through Cross-Attention for High-Resolution Remote Sensing Imagery Segmentation
    Luo, Yiyun
    Wang, Jinnian
    Yang, Xiankun
    Yu, Zhenyu
    Tan, Zixuan
    REMOTE SENSING, 2022, 14 (21)
  • [32] Self-Attention Network With Joint Loss for Remote Sensing Image Scene Classification
    Wu, Honglin
    Zhao, Shuzhen
    Li, Liang
    Lu, Chaoquan
    Chen, Wen
    IEEE ACCESS, 2020, 8 : 210347 - 210359
  • [33] Vision Transformer: An Excellent Teacher for Guiding Small Networks in Remote Sensing Image Scene Classification
    Xu, Kejie
    Deng, Peifang
    Huang, Hong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [34] PReLim: A Modeling Paradigm for Remote Sensing Image Scene Classification Under Limited Labeled Samples
    Dutta, Suparna
    Das, Monidipa
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2021, 2024, 13102 : 545 - 555
  • [35] PSCLI-TF: Position-Sensitive Cross-Layer Interactive Transformer Model for Remote Sensing Image Scene Classification
    Li, Daxiang
    Liu, Runyuan
    Tang, Yao
    Liu, Ying
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [36] Dual-Branch Cross-Attention Network for Micro-Expression Recognition with Transformer Variants
    Xie, Zhihua
    Zhao, Chuwei
    ELECTRONICS, 2024, 13 (02)
  • [37] Style and content separation network for remote sensing image cross-scene generalization
    Zhu, Sihan
    Wu, Chen
    Du, Bo
    Zhang, Liangpei
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2023, 201 : 1 - 11
  • [38] Interactive CNN and Transformer-Based Cross-Attention Fusion Network for Medical Image Classification
    Cai, Shu
    Zhang, Qiude
    Wang, Shanshan
    Hu, Junjie
    Zeng, Liang
    Li, Kaiyan
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2025, 35 (03)
  • [39] Combining Multilevel Features for Remote Sensing Image Scene Classification With Attention Model
    Ji, Jinsheng
    Zhang, Tao
    Jiang, Linfeng
    Zhong, Weilin
    Xiong, Huilin
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (09) : 1647 - 1651
  • [40] SCViT: A Spatial-Channel Feature Preserving Vision Transformer for Remote Sensing Image Scene Classification
    Lv, Pengyuan
    Wu, Wenjun
    Zhong, Yanfei
    Du, Fang
    Zhang, Liangpei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60