A hybrid CNN-vision transformer structure for remote sensing scene classification

被引:1
作者
Li, Nan [1 ]
Hao, Siyuan [2 ]
Zhao, Kun [1 ]
机构
[1] Qingdao Univ Technol, Sch Informat & Control Engn, Qingdao, Shandong, Peoples R China
[2] Beijing Jiaotong Univ, Sch Software Engn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Remote sensing image; scene classification; swin-transformer; convolutional neural network; FUSION; MODEL;
D O I
10.1080/2150704X.2024.2302348
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
Vision Transformers (ViTs) have become one of the main architectures in deep learning with the self-attention mechanism, and are becoming an alternative to Convolutional Neural Networks (CNNs) for remote sensing scene classification tasks. However, the earlier self-attention layer of ViTs focuses on local features rather than global features, and the deeper self-attention layer focuses on global features but ignores the different impact of different frequency information. This will greatly increase the training and computational cost due to the quadratic complexity of the self-attention mechanism on the long sequence representation. In this paper, we propose a hybrid CNN - vision transformer structure (HCVNet), which uses convolutional layers to replace the earlier self-attention layers, and a novel Frequency Multi-head Self Attention (F-MSA) mechanism to replace the deeper self-attention layers. Specifically, F-MSA is a dual-stream structure that reduces the computational cost and improves the classification performance by encoding the high/low frequency information separately. In addition, a Semantic-aware Localization (SaL) module is introduced, which can guide the selection of crop by learning prior knowledge, avoiding the issue of pure background sampling. Our method performed an accuracy value of 97.20 +/- 0.02% on the Aerial Image Dataset and 93.89 +/- 0.03% on the NWPU-RESISC45 Dataset, with low complexity costs.
引用
收藏
页码:88 / 98
页数:11
相关论文
共 50 条
  • [21] P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer for Remote Sensing Image Classification
    Wang, Guanqun
    Chen, He
    Chen, Liang
    Zhuang, Yin
    Zhang, Shanghang
    Zhang, Tong
    Dong, Hao
    Gao, Peng
    REMOTE SENSING, 2023, 15 (07)
  • [22] Recent advances in the application of vision transformers to remote sensing image scene classification
    Kumari, Monika
    Kaul, Ajay
    REMOTE SENSING LETTERS, 2023, 14 (07) : 722 - 732
  • [23] Multi-Output Network Combining GNN and CNN for Remote Sensing Scene Classification
    Peng, Feifei
    Lu, Wei
    Tan, Wenxia
    Qi, Kunlun
    Zhang, Xiaokang
    Zhu, Quansheng
    REMOTE SENSING, 2022, 14 (06)
  • [24] Energy-Based CNN Pruning for Remote Sensing Scene Classification
    Lu, Yiheng
    Gong, Maoguo
    Hu, Zhuping
    Zhao, Wei
    Guan, Ziyu
    Zhang, Mingyang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [25] Remote Sensing Image Scene Classification Using CNN-CapsNet
    Zhang, Wei
    Tang, Ping
    Zhao, Lijun
    REMOTE SENSING, 2019, 11 (05)
  • [26] Remote Sensing Image Classification Method Based on Fusion of CNN and Transformer
    Jin Chuan
    Tong Changqing
    LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (20)
  • [27] Integration of heterogeneous features for remote sensing scene classification
    Wang, Xin
    Xiong, Xingnan
    Ning, Chen
    Shi, Aiye
    Lv, Guofang
    JOURNAL OF APPLIED REMOTE SENSING, 2018, 12 (01):
  • [28] Remote Sensing Scene Classification Using Spatial Transformer Fusion Network
    Tong, Shun
    Qi, Kunlun
    Guan, Qingfeng
    Zhu, Qiqi
    Yang, Chao
    Zheng, Jie
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 549 - 552
  • [29] GLFFNet model for remote sensing image scene classification
    Wang W.
    Deng J.
    Wang X.
    Li Z.
    Yuan P.
    Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2023, 52 (10): : 1693 - 1702
  • [30] FCT: fusing CNN and transformer for scene classification
    Xie, Yuxiang
    Yan, Jie
    Kang, Lai
    Guo, Yanming
    Zhang, Jiahui
    Luan, Xidao
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 611 - 618