Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation

被引：402

作者：

He, Xin ^{[1
,2
]}

Zhou, Yong ^{[1
,2
]}

Zhao, Jiaqi ^{[1
,2
]}

Zhang, Di ^{[1
,2
]}

Yao, Rui ^{[1
,2
]}

Xue, Yong ^{[3
,4
]}

机构：

[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China

[2] Minist Educ Peoples Republ China, Engn Res Ctr Mine Digitizat, Xuzhou 221116, Jiangsu, Peoples R China

[3] China Univ Min & Technol, Sch Environm Sci & Spatial Informat, Xuzhou 221116, Jiangsu, Peoples R China

[4] Univ Derby, Sch Elect Comp & Math, Derby DE22 1GB, England

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2022年 / 60卷

基金：

中国国家自然科学基金;

关键词：

Transformers; Semantics; Image segmentation; Feature extraction; Convolutional neural networks; Remote sensing; Task analysis; Global information embedding; remote sensing (RS); semantic segmentation; Swin transformer; CLASSIFICATION; RECOGNITION;

D O I：

10.1109/TGRS.2022.3144165

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Global context information is essential for the semantic segmentation of remote sensing (RS) images. However, most existing methods rely on a convolutional neural network (CNN), which is challenging to directly obtain the global context due to the locality of the convolution operation. Inspired by the Swin transformer with powerful global modeling capabilities, we propose a novel semantic segmentation framework for RS images called ST-U-shaped network (UNet), which embeds the Swin transformer into the classical CNN-based UNet. ST-UNet constitutes a novel dual encoder structure of the Swin transformer and CNN in parallel. First, we propose a spatial interaction module (SIM), which encodes spatial information in the Swin transformer block by establishing pixel-level correlation to enhance the feature representation ability of occluded objects. Second, we construct a feature compression module (FCM) to reduce the loss of detailed information and condense more small-scale features in patch token downsampling of the Swin transformer, which improves the segmentation accuracy of small-scale ground objects. Finally, as a bridge between dual encoders, a relational aggregation module (RAM) is designed to integrate global dependencies from the Swin transformer into the features from CNN hierarchically. Our ST-UNet brings significant improvement on the ISPRS-Vaihingen and Potsdam datasets, respectively. The code will be available at <uri>https://github.com/XinnHe/ST-UNet</uri>.

引用

页数：15

共 50 条

[41] Co-Training Transformer for Remote Sensing Image Classification, Segmentation, and Detection
Li, Qingyun
Chen, Yushi
He, Xin
Huang, Lingbo
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 (1-18): : 1 - 18
[42] Remote Sensing Image Semantic Segmentation Based on Cascaded Transformer
Wang F.
Ji J.
Wang Y.
IEEE. Trans. Artif. Intell., 2024, 8 (4136-4148): : 4136 - 4148
[43] A Mamba-Diffusion Framework for Multimodal Remote Sensing Image Semantic Segmentation
Du, Wen-Liang
Gu, Yang
Zhao, Jiaqi
Zhu, Hancheng
Yao, Rui
Zhou, Yong
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
[44] Semisupervised Multiscale Generative Adversarial Network for Semantic Segmentation of Remote Sensing Image
Wang, Jiaqi
Liu, Bing
Zhou, Yong
Zhao, Jiaqi
Xia, Shixiong
Yang, Yuancan
Zhang, Man
Ming, Liu Ming
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[45] Semantic Co-Occurrence and Relationship Modeling for Remote Sensing Image Segmentation
Zhang, Yinxing
Song, Haochen
Wang, Qingwang
Jin, Pengcheng
Shen, Tao
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 6630 - 6640
[46] SWCGAN: Generative Adversarial Network Combining Swin Transformer and CNN for Remote Sensing Image Super-Resolution
Tu, Jingzhi
Mei, Gang
Ma, Zhengjing
Piccialli, Francesco
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 5662 - 5673
[47] A Multilevel Multimodal Fusion Transformer for Remote Sensing Semantic Segmentation
Ma, Xianping
Zhang, Xiaokang
Pun, Man-On
Liu, Ming
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
[48] Embedding Generalized Semantic Knowledge Into Few-Shot Remote Sensing Segmentation
Wang, Qi
Jia, Yuyu
Huang, Wei
Gao, Junyu
Li, Qiang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
[49] AAFormer: Attention-Attended Transformer for Semantic Segmentation of Remote Sensing Images
Li, Xin
Xu, Feng
Li, Linyang
Xu, Nan
Liu, Fan
Yuan, Chi
Chen, Ziqi
Lyu, Xin
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
[50] Swin-CDSA: The Semantic Segmentation of Remote Sensing Images Based on Cascaded Depthwise Convolution and Spatial Attention Mechanism
Kang, Yuhan
Ji, Jian
Xu, Hekai
Yang, Yong
Chen, Peng
Zhao, Hui
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21

← 1 2 3 4 5 →