A hybrid CNN-vision transformer structure for remote sensing scene classification

被引:1
作者
Li, Nan [1 ]
Hao, Siyuan [2 ]
Zhao, Kun [1 ]
机构
[1] Qingdao Univ Technol, Sch Informat & Control Engn, Qingdao, Shandong, Peoples R China
[2] Beijing Jiaotong Univ, Sch Software Engn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Remote sensing image; scene classification; swin-transformer; convolutional neural network; FUSION; MODEL;
D O I
10.1080/2150704X.2024.2302348
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
Vision Transformers (ViTs) have become one of the main architectures in deep learning with the self-attention mechanism, and are becoming an alternative to Convolutional Neural Networks (CNNs) for remote sensing scene classification tasks. However, the earlier self-attention layer of ViTs focuses on local features rather than global features, and the deeper self-attention layer focuses on global features but ignores the different impact of different frequency information. This will greatly increase the training and computational cost due to the quadratic complexity of the self-attention mechanism on the long sequence representation. In this paper, we propose a hybrid CNN - vision transformer structure (HCVNet), which uses convolutional layers to replace the earlier self-attention layers, and a novel Frequency Multi-head Self Attention (F-MSA) mechanism to replace the deeper self-attention layers. Specifically, F-MSA is a dual-stream structure that reduces the computational cost and improves the classification performance by encoding the high/low frequency information separately. In addition, a Semantic-aware Localization (SaL) module is introduced, which can guide the selection of crop by learning prior knowledge, avoiding the issue of pure background sampling. Our method performed an accuracy value of 97.20 +/- 0.02% on the Aerial Image Dataset and 93.89 +/- 0.03% on the NWPU-RESISC45 Dataset, with low complexity costs.
引用
收藏
页码:88 / 98
页数:11
相关论文
共 50 条
  • [1] MITformer: A Multiinstance Vision Transformer for Remote Sensing Scene Classification
    Sha, Zongyao
    Li, Jianfeng
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [2] REVIEW OF VISION TRANSFORMER MODELS FOR REMOTE SENSING IMAGE SCENE CLASSIFICATION
    Lv, Pengyuan
    Wu, Wenjun
    Zhong, Yanfei
    Zhang, Liangpei
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 2231 - 2234
  • [3] Vision Transformer With Contrastive Learning for Remote Sensing Image Scene Classification
    Bi, Meiqiao
    Wang, Minghua
    Li, Zhi
    Hong, Danfeng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 738 - 749
  • [4] HELViT: highly efficient lightweight vision transformer for remote sensing image scene classification
    Dongen Guo
    Zechen Wu
    Jiangfan Feng
    Zhuoke Zhou
    Zhen Shen
    Applied Intelligence, 2023, 53 : 24947 - 24962
  • [5] SCViT: A Spatial-Channel Feature Preserving Vision Transformer for Remote Sensing Image Scene Classification
    Lv, Pengyuan
    Wu, Wenjun
    Zhong, Yanfei
    Du, Fang
    Zhang, Liangpei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [6] HELViT: highly efficient lightweight vision transformer for remote sensing image scene classification
    Guo, Dongen
    Wu, Zechen
    Feng, Jiangfan
    Zhou, Zhuoke
    Shen, Zhen
    APPLIED INTELLIGENCE, 2023, 53 (21) : 24947 - 24962
  • [7] Vision Transformer: An Excellent Teacher for Guiding Small Networks in Remote Sensing Image Scene Classification
    Xu, Kejie
    Deng, Peifang
    Huang, Hong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [8] Improved Bilinear CNN Model for Remote Sensing Scene Classification
    Li, Erzhu
    Samat, Alim
    Du, Peijun
    Liu, Wei
    Hu, Jinshan
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [9] Enhancing Remote Sensing Scene Classification With Hy-MSDA: A Hybrid CNN-Transformer for Multisource Domain Adaptation
    Xu, Kai
    Zhu, Zhou
    Wang, Wenxin
    Fan, Chengcheng
    Wu, Bocai
    Jia, Zhaohong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [10] When CNNs Meet Vision Transformer: A Joint Framework for Remote Sensing Scene Classification
    Deng, Peifang
    Xu, Kejie
    Huang, Hong
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19