A hybrid CNN-vision transformer structure for remote sensing scene classification

被引：1

作者：

Li, Nan ^{[1
]}

Hao, Siyuan ^{[2
]}

Zhao, Kun ^{[1
]}

机构：

[1] Qingdao Univ Technol, Sch Informat & Control Engn, Qingdao, Shandong, Peoples R China

[2] Beijing Jiaotong Univ, Sch Software Engn, Beijing, Peoples R China

来源：

REMOTE SENSING LETTERS | 2024年 / 15卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Remote sensing image; scene classification; swin-transformer; convolutional neural network; FUSION; MODEL;

D O I：

10.1080/2150704X.2024.2302348

中图分类号：

TP7 [遥感技术];

学科分类号：

081102 ; 0816 ; 081602 ; 083002 ; 1404 ;

摘要：

Vision Transformers (ViTs) have become one of the main architectures in deep learning with the self-attention mechanism, and are becoming an alternative to Convolutional Neural Networks (CNNs) for remote sensing scene classification tasks. However, the earlier self-attention layer of ViTs focuses on local features rather than global features, and the deeper self-attention layer focuses on global features but ignores the different impact of different frequency information. This will greatly increase the training and computational cost due to the quadratic complexity of the self-attention mechanism on the long sequence representation. In this paper, we propose a hybrid CNN - vision transformer structure (HCVNet), which uses convolutional layers to replace the earlier self-attention layers, and a novel Frequency Multi-head Self Attention (F-MSA) mechanism to replace the deeper self-attention layers. Specifically, F-MSA is a dual-stream structure that reduces the computational cost and improves the classification performance by encoding the high/low frequency information separately. In addition, a Semantic-aware Localization (SaL) module is introduced, which can guide the selection of crop by learning prior knowledge, avoiding the issue of pure background sampling. Our method performed an accuracy value of 97.20 +/- 0.02% on the Aerial Image Dataset and 93.89 +/- 0.03% on the NWPU-RESISC45 Dataset, with low complexity costs.

引用

页码：88 / 98

页数：11

共 50 条

[1] MITformer: A Multiinstance Vision Transformer for Remote Sensing Scene Classification
Sha, Zongyao
Li, Jianfeng
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[2] REVIEW OF VISION TRANSFORMER MODELS FOR REMOTE SENSING IMAGE SCENE CLASSIFICATION
Lv, Pengyuan
Wu, Wenjun
Zhong, Yanfei
Zhang, Liangpei
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 2231 - 2234
[3] Vision Transformer With Contrastive Learning for Remote Sensing Image Scene Classification
Bi, Meiqiao
Wang, Minghua
Li, Zhi
Hong, Danfeng
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 738 - 749
[4] HELViT: highly efficient lightweight vision transformer for remote sensing image scene classification
Dongen Guo
Zechen Wu
Jiangfan Feng
Zhuoke Zhou
Zhen Shen
Applied Intelligence, 2023, 53 : 24947 - 24962
[5] SCViT: A Spatial-Channel Feature Preserving Vision Transformer for Remote Sensing Image Scene Classification
Lv, Pengyuan
Wu, Wenjun
Zhong, Yanfei
Du, Fang
Zhang, Liangpei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[6] HELViT: highly efficient lightweight vision transformer for remote sensing image scene classification
Guo, Dongen
Wu, Zechen
Feng, Jiangfan
Zhou, Zhuoke
Shen, Zhen
APPLIED INTELLIGENCE, 2023, 53 (21) : 24947 - 24962
[7] Vision Transformer: An Excellent Teacher for Guiding Small Networks in Remote Sensing Image Scene Classification
Xu, Kejie
Deng, Peifang
Huang, Hong
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[8] Improved Bilinear CNN Model for Remote Sensing Scene Classification
Li, Erzhu
Samat, Alim
Du, Peijun
Liu, Wei
Hu, Jinshan
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[9] Enhancing Remote Sensing Scene Classification With Hy-MSDA: A Hybrid CNN-Transformer for Multisource Domain Adaptation
Xu, Kai
Zhu, Zhou
Wang, Wenxin
Fan, Chengcheng
Wu, Bocai
Jia, Zhaohong
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
[10] When CNNs Meet Vision Transformer: A Joint Framework for Remote Sensing Scene Classification
Deng, Peifang
Xu, Kejie
Huang, Hong
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19

← 1 2 3 4 5 →