FEST: Feature Enhancement Swin Transformer for Remote Sensing Image Semantic Segmentation

被引:0
作者
Zhang, Ronghuan [1 ,2 ,3 ]
Zhao, Jing [1 ,2 ,3 ]
Li, Ming [4 ]
Zou, Qingzhi [1 ,2 ,3 ]
机构
[1] Qilu Univ Technol, Shandong Acad Sci, Shandong Comp Sci Ctr, Key Lab Comp Power Network & Informat Secur,Minis, Jinan, Peoples R China
[2] Qilu Univ Technol, Shandong Acad Sci, Fac Comp Sci & Technol, Shandong Engn Res Ctr Big Data Appl Technol, Jinan, Peoples R China
[3] Shandong Fundamental Res Ctr Comp Sci, Shandong Prov Key Lab Comp Networks, Jinan, Peoples R China
[4] Shandong Univ Tradit Chinese Med, Sch Intelligence & Informat Engn, Jinan, Peoples R China
来源
PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024 | 2024年
关键词
global information; semantic segmentation; Swin Transformer;
D O I
10.1109/CSCWD61410.2024.10580494
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The global context is crucial for the precise segmentation of remote sensing images. However, the large volumes and high spatial resolutions of remote sensing images make efficient analysis of the entire scene challenging for most convolutional neural network (CNN)-based methods. To address this issue, we propose to design an innovative framework for semantic segmentation of remote sensing images called Feature Enhancement Swin Transformer (FEST). Firstly, we utilize the Swin Transformer as the encoder and incorporates a Global Information Enhancement Model (GIEM) within each Swin Transformer block to reduce information loss and enable encoding of more accurate spatial information. Secondly, we introduce an enhanced decoding structure called Enhanced Feature Fusion Module (EFFM) with added enhanced channel and spatial attention modules to retain localized information while obtaining extensive contextual information. Finally, for loss calculation, we utilize the dice and cross-entropy loss to jointly supervise the model, aiming to achieve a competitive performance. We comprehensively evaluated FEST on the ISPRS-Vaihingen and Potsdam datasets. The results indicate that our approach has achieved significant improvements in semantic segmentation tasks compared to existing methods.
引用
收藏
页码:1177 / 1182
页数:6
相关论文
共 13 条
[1]  
Chen J., 2023, IEEE Geosci. Remote Sens. Lett., V20, P1, DOI [10.1109/LGRS.2023.3327763, DOI 10.1109/LGRS.2023.3327763]
[2]  
Li R., 2020, arXiv
[3]   DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation [J].
Lin, Ailiang ;
Chen, Bingzhi ;
Xu, Jiayu ;
Zhang, Zheng ;
Lu, Guangming ;
Zhang, David .
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
[4]   Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [J].
Liu, Ze ;
Lin, Yutong ;
Cao, Yue ;
Hu, Han ;
Wei, Yixuan ;
Zhang, Zheng ;
Lin, Stephen ;
Guo, Baining .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9992-10002
[5]  
Liu Zihan, 2022, arXiv
[6]  
Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965
[7]   A Semantic Segmentation Method for Remote Sensing Images based on Deeplab v3 [J].
Qian, Zhaoyong ;
Cao, Yuhua ;
Shi, Zengkai ;
Qiu, Luyi ;
Shi, Chenguang .
2021 2ND INTERNATIONAL CONFERENCE ON BIG DATA & ARTIFICIAL INTELLIGENCE & SOFTWARE ENGINEERING (ICBASE 2021), 2021, :396-400
[8]   Seismic Data Reconstruction and Denoising by Enhanced Hankel Low-Rank Matrix Estimation [J].
Wang, Chong ;
Gu, Zhiyuan ;
Zhu, Zhihui .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[9]   UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery [J].
Wang, Libo ;
Li, Rui ;
Zhang, Ce ;
Fang, Shenghui ;
Duan, Chenxi ;
Meng, Xiaoliang ;
Atkinson, Peter M. .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 190 :196-214
[10]   Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images [J].
Wang, Libo ;
Li, Rui ;
Wang, Dongzhi ;
Duan, Chenxi ;
Wang, Teng ;
Meng, Xiaoliang .
REMOTE SENSING, 2021, 13 (16)