Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation

被引:402
|
作者
He, Xin [1 ,2 ]
Zhou, Yong [1 ,2 ]
Zhao, Jiaqi [1 ,2 ]
Zhang, Di [1 ,2 ]
Yao, Rui [1 ,2 ]
Xue, Yong [3 ,4 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[2] Minist Educ Peoples Republ China, Engn Res Ctr Mine Digitizat, Xuzhou 221116, Jiangsu, Peoples R China
[3] China Univ Min & Technol, Sch Environm Sci & Spatial Informat, Xuzhou 221116, Jiangsu, Peoples R China
[4] Univ Derby, Sch Elect Comp & Math, Derby DE22 1GB, England
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2022年 / 60卷
基金
中国国家自然科学基金;
关键词
Transformers; Semantics; Image segmentation; Feature extraction; Convolutional neural networks; Remote sensing; Task analysis; Global information embedding; remote sensing (RS); semantic segmentation; Swin transformer; CLASSIFICATION; RECOGNITION;
D O I
10.1109/TGRS.2022.3144165
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Global context information is essential for the semantic segmentation of remote sensing (RS) images. However, most existing methods rely on a convolutional neural network (CNN), which is challenging to directly obtain the global context due to the locality of the convolution operation. Inspired by the Swin transformer with powerful global modeling capabilities, we propose a novel semantic segmentation framework for RS images called ST-U-shaped network (UNet), which embeds the Swin transformer into the classical CNN-based UNet. ST-UNet constitutes a novel dual encoder structure of the Swin transformer and CNN in parallel. First, we propose a spatial interaction module (SIM), which encodes spatial information in the Swin transformer block by establishing pixel-level correlation to enhance the feature representation ability of occluded objects. Second, we construct a feature compression module (FCM) to reduce the loss of detailed information and condense more small-scale features in patch token downsampling of the Swin transformer, which improves the segmentation accuracy of small-scale ground objects. Finally, as a bridge between dual encoders, a relational aggregation module (RAM) is designed to integrate global dependencies from the Swin transformer into the features from CNN hierarchically. Our ST-UNet brings significant improvement on the ISPRS-Vaihingen and Potsdam datasets, respectively. The code will be available at <uri>https://github.com/XinnHe/ST-UNet</uri>.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Co-Training Transformer for Remote Sensing Image Classification, Segmentation, and Detection
    Li, Qingyun
    Chen, Yushi
    He, Xin
    Huang, Lingbo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 (1-18): : 1 - 18
  • [42] Remote Sensing Image Semantic Segmentation Based on Cascaded Transformer
    Wang F.
    Ji J.
    Wang Y.
    IEEE. Trans. Artif. Intell., 2024, 8 (4136-4148): : 4136 - 4148
  • [43] A Mamba-Diffusion Framework for Multimodal Remote Sensing Image Semantic Segmentation
    Du, Wen-Liang
    Gu, Yang
    Zhao, Jiaqi
    Zhu, Hancheng
    Yao, Rui
    Zhou, Yong
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [44] Semisupervised Multiscale Generative Adversarial Network for Semantic Segmentation of Remote Sensing Image
    Wang, Jiaqi
    Liu, Bing
    Zhou, Yong
    Zhao, Jiaqi
    Xia, Shixiong
    Yang, Yuancan
    Zhang, Man
    Ming, Liu Ming
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [45] Semantic Co-Occurrence and Relationship Modeling for Remote Sensing Image Segmentation
    Zhang, Yinxing
    Song, Haochen
    Wang, Qingwang
    Jin, Pengcheng
    Shen, Tao
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 6630 - 6640
  • [46] SWCGAN: Generative Adversarial Network Combining Swin Transformer and CNN for Remote Sensing Image Super-Resolution
    Tu, Jingzhi
    Mei, Gang
    Ma, Zhengjing
    Piccialli, Francesco
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 5662 - 5673
  • [47] A Multilevel Multimodal Fusion Transformer for Remote Sensing Semantic Segmentation
    Ma, Xianping
    Zhang, Xiaokang
    Pun, Man-On
    Liu, Ming
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [48] Embedding Generalized Semantic Knowledge Into Few-Shot Remote Sensing Segmentation
    Wang, Qi
    Jia, Yuyu
    Huang, Wei
    Gao, Junyu
    Li, Qiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [49] AAFormer: Attention-Attended Transformer for Semantic Segmentation of Remote Sensing Images
    Li, Xin
    Xu, Feng
    Li, Linyang
    Xu, Nan
    Liu, Fan
    Yuan, Chi
    Chen, Ziqi
    Lyu, Xin
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [50] Swin-CDSA: The Semantic Segmentation of Remote Sensing Images Based on Cascaded Depthwise Convolution and Spatial Attention Mechanism
    Kang, Yuhan
    Ji, Jian
    Xu, Hekai
    Yang, Yong
    Chen, Peng
    Zhao, Hui
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21