Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation

被引:402
|
作者
He, Xin [1 ,2 ]
Zhou, Yong [1 ,2 ]
Zhao, Jiaqi [1 ,2 ]
Zhang, Di [1 ,2 ]
Yao, Rui [1 ,2 ]
Xue, Yong [3 ,4 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[2] Minist Educ Peoples Republ China, Engn Res Ctr Mine Digitizat, Xuzhou 221116, Jiangsu, Peoples R China
[3] China Univ Min & Technol, Sch Environm Sci & Spatial Informat, Xuzhou 221116, Jiangsu, Peoples R China
[4] Univ Derby, Sch Elect Comp & Math, Derby DE22 1GB, England
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2022年 / 60卷
基金
中国国家自然科学基金;
关键词
Transformers; Semantics; Image segmentation; Feature extraction; Convolutional neural networks; Remote sensing; Task analysis; Global information embedding; remote sensing (RS); semantic segmentation; Swin transformer; CLASSIFICATION; RECOGNITION;
D O I
10.1109/TGRS.2022.3144165
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Global context information is essential for the semantic segmentation of remote sensing (RS) images. However, most existing methods rely on a convolutional neural network (CNN), which is challenging to directly obtain the global context due to the locality of the convolution operation. Inspired by the Swin transformer with powerful global modeling capabilities, we propose a novel semantic segmentation framework for RS images called ST-U-shaped network (UNet), which embeds the Swin transformer into the classical CNN-based UNet. ST-UNet constitutes a novel dual encoder structure of the Swin transformer and CNN in parallel. First, we propose a spatial interaction module (SIM), which encodes spatial information in the Swin transformer block by establishing pixel-level correlation to enhance the feature representation ability of occluded objects. Second, we construct a feature compression module (FCM) to reduce the loss of detailed information and condense more small-scale features in patch token downsampling of the Swin transformer, which improves the segmentation accuracy of small-scale ground objects. Finally, as a bridge between dual encoders, a relational aggregation module (RAM) is designed to integrate global dependencies from the Swin transformer into the features from CNN hierarchically. Our ST-UNet brings significant improvement on the ISPRS-Vaihingen and Potsdam datasets, respectively. The code will be available at <uri>https://github.com/XinnHe/ST-UNet</uri>.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Multitask Semantic Boundary Awareness Network for Remote Sensing Image Segmentation
    Li, Aijin
    Jiao, Licheng
    Zhu, Hao
    Li, Lingling
    Liu, Fang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [22] CTCFNet: CNN-Transformer Complementary and Fusion Network for High-Resolution Remote Sensing Image Semantic Segmentation
    Lu, Chen
    Zhang, Xian
    Du, Kaile
    Xu, Han
    Liu, Guangcan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [23] Efficient Swin Transformer for Remote Sensing Image Super-Resolution
    Kang, Xudong
    Duan, Puhong
    Li, Jier
    Li, Shutao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6367 - 6379
  • [24] CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation
    Chen, Xin
    Li, Dongfen
    Liu, Mingzhe
    Jia, Jiaru
    REMOTE SENSING, 2023, 15 (18)
  • [25] A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images
    Wang, Libo
    Li, Rui
    Duan, Chenxi
    Zhang, Ce
    Meng, Xiaoliang
    Fang, Shenghui
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [26] MMT: Mixed-Mask Transformer for Remote Sensing Image Semantic Segmentation
    Xu, Zhe
    Geng, Jie
    Jiang, Wen
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [27] SSDT: Scale-Separation Semantic Decoupled Transformer for Semantic Segmentation of Remote Sensing Images
    Zheng, Chengyu
    Jiang, Yanru
    Lv, Xiaowei
    Nie, Jie
    Liang, Xinyue
    Wei, Zhiqiang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 9037 - 9052
  • [28] ORBNet: Original Reinforcement Bilateral Network for High-Resolution Remote Sensing Image Semantic Segmentation
    Zhang, Yijie
    Cheng, Jian
    Su, Yanzhou
    Wu, Yuheng
    Ma, Qijun
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 15900 - 15913
  • [29] SWINT-RESNet: An Improved Remote Sensing Image Segmentation Model Based on Transformer
    Ma, Yuefeng
    Wang, Yingli
    Liu, Xingya
    Wang, Haiying
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [30] RSSGLT: Remote Sensing Image Segmentation Network Based on Global-Local Transformer
    Kumar, Satyawant
    Kumar, Abhishek
    Lee, Dong-Gyu
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5