STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation

被引:143
|
作者
Gao, Liang [1 ,2 ,3 ]
Liu, Hui [2 ,3 ,4 ]
Yang, Minhang [1 ,2 ,3 ]
Chen, Long [1 ,2 ,3 ]
Wan, Yaling [1 ,2 ,3 ]
Xiao, Zhengqing [5 ]
Qian, Yurong [1 ,2 ,3 ]
机构
[1] Xinjiang Univ, Coll Software, Urumqi 830008, Peoples R China
[2] Key Lab Signal Detect & Proc Xinjiang Uygur Auton, Urumqi 830014, Peoples R China
[3] Key Lab Software Engn, Urumqi 830008, Peoples R China
[4] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830014, Peoples R China
[5] Xinjiang Univ, Coll Math & Syst Sci, Urumqi 830014, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Remote sensing; Transformers; Semantics; Image segmentation; Computational modeling; Feature extraction; Context modeling; self-attention; semantic segmentation; Transformer;
D O I
10.1109/JSTARS.2021.3119654
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The applied research in remote sensing images has been pushed by convolutional neural network (CNN). Because of the fixed size of the perceptual field, CNN is unable to model global semantic relevance. Modeling global semantic information is possible with the self-attentive Transformer-based model. However, the method of patch computation used by Transformer for self-attentive computation ignores the spatial information inside each patch. To address these issues, we offer the STransFuse model as a new semantic segmentation method for remote sensing images. It is a model that combines the benefits of Transformer with CNN to improve the segmentation quality of various remote sensing images. We employ a staged model to extract coarse-grained and fine-grained feature representations at various semantic scales, unlike earlier techniques based on Transformer model fusion. In order to take full advantage of the features acquired at different stages, we designed an adaptive fusion module. This module adaptively fuses the semantic information between features at different scales employing a self-attentive mechanism. The overall accuracy (OA) of our proposed model on the Vaihingen dataset is 1.36% higher than the baseline, and 1.27% improvement in OA over baseline on the Potsdam dataset. When compared to other advanced models, the STransFuse model performs admirably.
引用
收藏
页码:10990 / 11003
页数:14
相关论文
共 50 条
  • [1] Combining Swin Transformer With UNet for Remote Sensing Image Semantic Segmentation
    Fan, Lili
    Zhou, Yu
    Liu, Hongmei
    Li, Yunjie
    Cao, Dongpu
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 11
  • [2] Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation
    He, Xin
    Zhou, Yong
    Zhao, Jiaqi
    Zhang, Di
    Yao, Rui
    Xue, Yong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [3] FEST: Feature Enhancement Swin Transformer for Remote Sensing Image Semantic Segmentation
    Zhang, Ronghuan
    Zhao, Jing
    Li, Ming
    Zou, Qingzhi
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1177 - 1182
  • [4] Classification of High-Resolution Remote Sensing Image Based on Swin Transformer and Convolutional Neural Network
    He Xiaoying
    Xu Weiming
    Pan Kaixiang
    Wang Juan
    Li Ziwei
    LASER & OPTOELECTRONICS PROGRESS, 2024, 61 (14)
  • [5] Enhanced Swin Transformer and Edge Spatial Attention for Remote Sensing Image Semantic Segmentation
    Liu, Fuxiang
    Hu, Zhiqiang
    Li, Lei
    Li, Hanlu
    Liu, Xinxin
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 1296 - 1300
  • [6] Convolutional Neural Network for the Semantic Segmentation of Remote Sensing Images
    Muhammad Alam
    Jian-Feng Wang
    Cong Guangpei
    LV Yunrong
    Yuanfang Chen
    Mobile Networks and Applications, 2021, 26 : 200 - 215
  • [7] Convolutional Neural Network for the Semantic Segmentation of Remote Sensing Images
    Alam, Muhammad
    Wang, Jian-Feng
    Guangpei, Cong
    Yunrong, L., V
    Chen, Yuanfang
    MOBILE NETWORKS & APPLICATIONS, 2021, 26 (01): : 200 - 215
  • [8] Semantic Segmentation of Remote Sensing Image Based on Convolutional Neural Network and Mask Generation
    Niu, Binglin
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [9] Swin-Conv-Dspp and Global Local Transformer for Remote Sensing Image Semantic Segmentation
    Mo, Youda
    Li, Huihui
    Xiao, Xiangling
    Zhao, Huimin
    Liu, Xiaoyong
    Zhan, Jin
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 5284 - 5296
  • [10] Hybrid semantic segmentation for tunnel lining cracks based on Swin Transformer and convolutional neural network
    Zhou, Zhong
    Zhang, Junjie
    Gong, Chenjie
    COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2023, 38 (17) : 2491 - 2510