A hybrid CNN-vision transformer structure for remote sensing scene classification

被引:1
作者
Li, Nan [1 ]
Hao, Siyuan [2 ]
Zhao, Kun [1 ]
机构
[1] Qingdao Univ Technol, Sch Informat & Control Engn, Qingdao, Shandong, Peoples R China
[2] Beijing Jiaotong Univ, Sch Software Engn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Remote sensing image; scene classification; swin-transformer; convolutional neural network; FUSION; MODEL;
D O I
10.1080/2150704X.2024.2302348
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
Vision Transformers (ViTs) have become one of the main architectures in deep learning with the self-attention mechanism, and are becoming an alternative to Convolutional Neural Networks (CNNs) for remote sensing scene classification tasks. However, the earlier self-attention layer of ViTs focuses on local features rather than global features, and the deeper self-attention layer focuses on global features but ignores the different impact of different frequency information. This will greatly increase the training and computational cost due to the quadratic complexity of the self-attention mechanism on the long sequence representation. In this paper, we propose a hybrid CNN - vision transformer structure (HCVNet), which uses convolutional layers to replace the earlier self-attention layers, and a novel Frequency Multi-head Self Attention (F-MSA) mechanism to replace the deeper self-attention layers. Specifically, F-MSA is a dual-stream structure that reduces the computational cost and improves the classification performance by encoding the high/low frequency information separately. In addition, a Semantic-aware Localization (SaL) module is introduced, which can guide the selection of crop by learning prior knowledge, avoiding the issue of pure background sampling. Our method performed an accuracy value of 97.20 +/- 0.02% on the Aerial Image Dataset and 93.89 +/- 0.03% on the NWPU-RESISC45 Dataset, with low complexity costs.
引用
收藏
页码:88 / 98
页数:11
相关论文
共 50 条
  • [41] Multiple Hierarchical Cross-Scale Transformer for Remote Sensing Scene Classification
    Zhang, Dan
    Ma, Wenping
    Jiao, Licheng
    Liu, Xu
    Yang, Yuting
    Liu, Fang
    REMOTE SENSING, 2025, 17 (01)
  • [42] Hybrid CNN and Transformer Network for Semantic Segmentation of UAV Remote Sensing Images
    Zhou X.
    Zhou L.
    Gong S.
    Zhang H.
    Zhong S.
    Xia Y.
    Huang Y.
    IEEE Journal on Miniaturization for Air and Space Systems, 2024, 5 (01): : 33 - 41
  • [43] Hierarchical Feature Fusion of Transformer With Patch Dilating for Remote Sensing Scene Classification
    Chen, Xiaoning
    Ma, Mingyang
    Li, Yong
    Mei, Shaohui
    Han, Zonghao
    Zhao, Jian
    Cheng, Wei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 16
  • [44] Enhanced hybrid CNN and transformer network for remote sensing image change detection
    Yang, Junjie
    Wan, Haibo
    Shang, Zhihai
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [45] Self-Attention Network With Joint Loss for Remote Sensing Image Scene Classification
    Wu, Honglin
    Zhao, Shuzhen
    Li, Liang
    Lu, Chaoquan
    Chen, Wen
    IEEE ACCESS, 2020, 8 : 210347 - 210359
  • [46] Remote sensing image scene classification using CNN-MLP with data augmentation
    Shawky, Osama A.
    Hagag, Ahmed
    El-Dahshan, El-Sayed A.
    Ismail, Manal A.
    OPTIK, 2020, 221
  • [47] Attack Selectivity of Adversarial Examples in Remote Sensing Image Scene Classification
    Chen, Li
    Li, Haifeng
    Zhu, Guowei
    Li, Qi
    Zhu, Jiawei
    Huang, Haozhe
    Peng, Jian
    Zhao, Lin
    IEEE ACCESS, 2020, 8 : 137477 - 137489
  • [48] A Feature Aggregation Convolutional Neural Network for Remote Sensing Scene Classification
    Lu, Xiaoqiang
    Sun, Hao
    Zheng, Xiangtao
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (10): : 7894 - 7906
  • [49] Scene Classification of Optical Remote Sensing Images Based on Residual Networks
    Wang Peng
    Liu Rui
    Xin Xuejing
    Liu Peidong
    LASER & OPTOELECTRONICS PROGRESS, 2021, 58 (02)
  • [50] Multi-attention aggregation network for remote sensing scene classification
    Wang, Xin
    Li, Yingying
    Shi, Aiye
    Zhou, Huiyu
    JOURNAL OF APPLIED REMOTE SENSING, 2023, 17 (04)