A cross-attention integrated shifted window transformer for remote sensing image scene recognition with limited data

被引：0

作者：

Li, Kaiyuan ^{[1
]}

Xue, Yong ^{[1
]}

Zhao, Jiaqi ^{[2
]}

Li, Honghao ^{[1
]}

Zhang, Sheng ^{[1
]}

机构：

[1] China Univ Min & Technol, Sch Environm Sci & Spatial Informat, Xuzhou, Peoples R China

[2] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou, Peoples R China

来源：

JOURNAL OF APPLIED REMOTE SENSING | 2024年 / 18卷 / 03期

基金：

中国国家自然科学基金;

关键词：

recognition; scene classification; insufficient data; remote sensing; deep learning; cross attention; CLASSIFICATION; NETWORK; TREE;

D O I：

10.1117/1.JRS.18.036506

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

The aim of remote sensing image scene recognition is to label a set of semantic categories based on their contents, and recognition for remote sensing images has a wide range of applications in many fields. However, it is a great challenge to extract category features with insufficiently labeled samples. We propose a Multi-scale Shift-window Cross-attention Vision Transformer (MSC-ViT) framework for remote sensing image scene recognition with limited data. Specifically, the proposed model is composed of three modules: a multi-scale feature extraction module, a shift-window transformer module, and a multi-scale cross-attention module. First, to enhance the efficiency of data utilization, we design a multi-scale module to fully extract the features of object information and spatial information contained in the image. The hierarchical transformer structure based on shifted windows, which are flexible at different scales, could match the computation of multi-scale features. The token fusion method based on the cross-attention mechanism fuses the features between multi-branch tokens and class tokens, which fully learn the information of the tokens and achieve better classification results. In addition, we integrate existing open-source datasets of remote sensing images and form a new dataset to better apply to the scene recognition task of remote sensing images with limited data. Our experimental results show that the proposed method achieves a great performance in scene classification of remote sensing images with limited data. The top-1 accuracy of the developed method is 79.84% with a 20% training ratio, 84.78% with a 40% training ratio, 89.79% with a 60% training ratio, and 91.43% with an 80% training ratio.

引用

页数：19

共 50 条

[1] Remote sensing image change detection based on swin transformer and cross-attention mechanism
Yan, Weidong
Cao, Li
Yan, Pei
Zhu, Chaosheng
Wang, Mengtian
EARTH SCIENCE INFORMATICS, 2025, 18 (01)
[2] Remote Sensing Image Classification Based on a Cross-Attention Mechanism and Graph Convolution
Cai, Weiwei
Wei, Zhanguo
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[3] Deformable Cross-Attention Transformer for Medical Image Registration
Chen, Junyu
Liu, Yihao
He, Yufan
Du, Yong
MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I, 2024, 14348 : 115 - 125
[4] Remote Sensing Image Scene Classification Based on an Enhanced Attention Module
Zhao, Zhicheng
Li, Jiaqi
Luo, Ze
Li, Jian
Chen, Can
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (11) : 1926 - 1930
[5] MCADNet: A Multi-Scale Cross-Attention Network for Remote Sensing Image Dehazing
Tao, Tao
Xu, Haoran
Guan, Xin
Zhou, Hao
MATHEMATICS, 2024, 12 (23)
[6] Optical remote sensing image salient object detection via bidirectional cross-attention and attention restoration
Gu, Yubin
Chen, Siting
Sun, Xiaoshuai
Ji, Jiayi
Zhou, Yiyi
Ji, Rongrong
PATTERN RECOGNITION, 2025, 164
[7] Selective Alignment Transformer for Partial-Set Remote Sensing Image Cross-Scene Classification
Li, Kun
Liu, Zhunga
Zhang, Zuowei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[8] Vision Transformer With Contrastive Learning for Remote Sensing Image Scene Classification
Bi, Meiqiao
Wang, Minghua
Li, Zhi
Hong, Danfeng
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 738 - 749
[9] A Cross-Attention Mechanism Based on Regional-Level Semantic Features of Images for Cross-Modal Text-Image Retrieval in Remote Sensing
Zheng, Fuzhong
Li, Weipeng
Wang, Xu
Wang, Luyao
Zhang, Xiong
Zhang, Haisu
APPLIED SCIENCES-BASEL, 2022, 12 (23):
[10] Spatial-Spectral Transformer With Cross-Attention for Hyperspectral Image Classification
Peng, Yishu
Zhang, Yuwen
Tu, Bing
Li, Qianming
Li, Wujing
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

← 1 2 3 4 5 →