Classification of High-Resolution Remote Sensing Image Based on Swin Transformer and Convolutional Neural Network

被引:8
作者
He Xiaoying [1 ,2 ,3 ]
Xu Weiming [1 ,2 ,3 ]
Pan Kaixiang [1 ,2 ,3 ]
Wang Juan [1 ,2 ,3 ]
Li Ziwei [1 ,2 ,3 ]
机构
[1] Fuzhou Univ, Acad Digital China, Fuzhou 350108, Fujian, Peoples R China
[2] Fuzhou Univ, Minist Educ, Key Lab Spatial Data Min Informat Sharing, Fuzhou 350002, Fujian, Peoples R China
[3] Fuzhou Univ, Natl Engn Res Ctr Geospatial Informat Technol, Fuzhou 350002, Fujian, Peoples R China
关键词
high-resolution remote sensing image; convolutional neural network; Swin Transformer; feature fusion; semantic segmentation;
D O I
10.3788/LOP232003
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
It is challenging to directly obtain global information of existing deep learning-based remote sensing intelligent interpretation methods, resulting in blurred object edges and low classification accuracy between similar classes. This study proposes a semantic segmentation model called SRAU-Net based on Swin Transformer and convolutional neural network. SRAU-Net adopts a Swin Transformer encoder-decoder framework with a U-Net shape and introduces several improvements to address the limitations of previous methods. First, Swin Transformer and convolutional neural network are used to construct a dual-branch encoder, which effectively captures spatial details with different scales and complements the context features, resulting in higher classification accuracy and sharper object edges. Second, a feature fusion module is designed as a bridge for the dual-branch encoder. This module efficiently fuses global and local features in channel and spatial dimensions, improving the segmentation accuracy for small target objects. Moreover, the proposed SRAU-Net model incorporates a feature enhancement module that utilizes attention mechanisms to adaptively fuse features from the encoder and decoder and enhances the aggregation of spatial and semantic features, further improving the ability of the model to extract features from remote sensing images. The effectiveness of the proposed SRAU-Net model is demonstrated using the ISPRS Vaihingen dataset for land cover classification. The results show that SRAU-Net outperforms other models in terms of overall accuracy and F1 score, achieving 92.06% and 86.90%, respectively. Notably, the SRAU-Net model excels in extracting object edge information and accurately classifying small-scale regions, with an improvement of 2.57 percentage points in the overall classification accuracy compared with the original model. Furthermore, it effectively distinguishes remote sensing objects with similar characteristics, such as trees and low vegetation.
引用
收藏
页数:12
相关论文
共 33 条
[1]   An Active Deep Learning Approach for Minimally Supervised PolSAR Image Classification [J].
Bi, Haixia ;
Xu, Feng ;
Wei, Zhiqiang ;
Xue, Yong ;
Xu, Zongben .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (11) :9378-9395
[2]  
Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
[3]  
Chen J., 2021, arXiv, DOI DOI 10.48550/ARXIV.2102.04306
[4]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[5]  
[陈霆 Chen Ting], 2022, [地球信息科学学报, Journal of Geo-Information Science], V24, P263
[6]  
di Gregorio A., 2023, LCCSEB/OL
[7]  
Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[8]   Land Cover Classification of Resources Survey Remote Sensing Images Based on Segmentation Model [J].
Fan, Zhenyu ;
Zhan, Tao ;
Gao, Zhichao ;
Li, Rui ;
Liu, Yao ;
Zhang, Lianzhi ;
Jin, Zixiang ;
Xu, Supeng .
IEEE ACCESS, 2022, 10 :56267-56281
[9]   Dual Attention Network for Scene Segmentation [J].
Fu, Jun ;
Liu, Jing ;
Tian, Haijie ;
Li, Yong ;
Bao, Yongjun ;
Fang, Zhiwei ;
Lu, Hanqing .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149
[10]  
Gao L, 2008, Transactions of the Chinese Society of Agricultural Engineering, V24, P73