A cross-attention integrated shifted window transformer for remote sensing image scene recognition with limited data

被引:0
作者
Li, Kaiyuan [1 ]
Xue, Yong [1 ]
Zhao, Jiaqi [2 ]
Li, Honghao [1 ]
Zhang, Sheng [1 ]
机构
[1] China Univ Min & Technol, Sch Environm Sci & Spatial Informat, Xuzhou, Peoples R China
[2] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
recognition; scene classification; insufficient data; remote sensing; deep learning; cross attention; CLASSIFICATION; NETWORK; TREE;
D O I
10.1117/1.JRS.18.036506
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The aim of remote sensing image scene recognition is to label a set of semantic categories based on their contents, and recognition for remote sensing images has a wide range of applications in many fields. However, it is a great challenge to extract category features with insufficiently labeled samples. We propose a Multi-scale Shift-window Cross-attention Vision Transformer (MSC-ViT) framework for remote sensing image scene recognition with limited data. Specifically, the proposed model is composed of three modules: a multi-scale feature extraction module, a shift-window transformer module, and a multi-scale cross-attention module. First, to enhance the efficiency of data utilization, we design a multi-scale module to fully extract the features of object information and spatial information contained in the image. The hierarchical transformer structure based on shifted windows, which are flexible at different scales, could match the computation of multi-scale features. The token fusion method based on the cross-attention mechanism fuses the features between multi-branch tokens and class tokens, which fully learn the information of the tokens and achieve better classification results. In addition, we integrate existing open-source datasets of remote sensing images and form a new dataset to better apply to the scene recognition task of remote sensing images with limited data. Our experimental results show that the proposed method achieves a great performance in scene classification of remote sensing images with limited data. The top-1 accuracy of the developed method is 79.84% with a 20% training ratio, 84.78% with a 40% training ratio, 89.79% with a 60% training ratio, and 91.43% with an 80% training ratio.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] SCViT: A Spatial-Channel Feature Preserving Vision Transformer for Remote Sensing Image Scene Classification
    Lv, Pengyuan
    Wu, Wenjun
    Zhong, Yanfei
    Du, Fang
    Zhang, Liangpei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [42] Transformer based on channel-spatial attention for accurate classification of scenes in remote sensing image
    Guo, Jingxia
    Jia, Nan
    Bai, Jinniu
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [43] A General Transitive Transfer Learning Framework for Cross-Optical Sensor Remote Sensing Image Scene Understanding
    Tao, Chao
    Xiao, Rong
    Wang, Yuze
    Qi, Ji
    Li, Haifeng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 4248 - 4260
  • [44] HELViT: highly efficient lightweight vision transformer for remote sensing image scene classification
    Guo, Dongen
    Wu, Zechen
    Feng, Jiangfan
    Zhou, Zhuoke
    Shen, Zhen
    APPLIED INTELLIGENCE, 2023, 53 (21) : 24947 - 24962
  • [45] HELViT: highly efficient lightweight vision transformer for remote sensing image scene classification
    Dongen Guo
    Zechen Wu
    Jiangfan Feng
    Zhuoke Zhou
    Zhen Shen
    Applied Intelligence, 2023, 53 : 24947 - 24962
  • [46] Road Extraction Method of Remote Sensing Image Based on Deformable Attention Transformer
    Zhao, Ling
    Zhang, Jianing
    Meng, Xiujun
    Zhou, Wenming
    Zhang, Zhenshi
    Peng, Chengli
    SYMMETRY-BASEL, 2024, 16 (04):
  • [47] Remote Sensing Image Scene Classification With Self-Supervised Paradigm Under Limited Labeled Samples
    Tao, Chao
    Qi, Ji
    Lu, Weipeng
    Wang, Hao
    Li, Haifeng
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [48] DTCA: Dual-Branch Transformer with Cross-Attention for EEG and Eye Movement Data Fusion
    Zhang, Xiaoshan
    Shi, Enze
    Yu, Sigang
    Zhang, Shu
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT II, 2024, 15002 : 141 - 151
  • [49] Channel selection and local attention transformer model for semantic segmentation on UAV remote sensing scene
    Liu, Da
    Long, Hao
    Liu, Zhenbao
    IET IMAGE PROCESSING, 2025, 19 (01)
  • [50] Highresolution Remote Sensing Image Classification With Limited Training Data
    Ariaei, Mehdi
    Ghassemian, Hassan
    Imani, Maryam
    PROCEEDINGS OF THE 13TH IRANIAN/3RD INTERNATIONAL MACHINE VISION AND IMAGE PROCESSING CONFERENCE, MVIP, 2024, : 47 - +