TCNet: Multiscale Fusion of Transformer and CNN for Semantic Segmentation of Remote Sensing Images

被引:12
作者
Xiang, Xuyang [1 ]
Gong, Wenping [1 ]
Li, Shuailong [1 ]
Chen, Jun [2 ]
Ren, Tianhe [1 ]
机构
[1] China Univ Geosci, Fac Engn, Wuhan 430074, Peoples R China
[2] China Univ Geosci, Sch Automat, Wuhan 430074, Peoples R China
关键词
Convolutional Neural Network (CNN); feature fusion; remote sensing images; semantic segmentation; Transformer;
D O I
10.1109/JSTARS.2024.3349625
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Semantic segmentation of remote sensing images plays a critical role in areas such as urban change detection, environmental protection, and geohazard identification. Convolutional Neural Networks (CNNs) have been excessively employed for semantic segmentation over the past few years; however, a limitation of the CNN is that there exists a challenge in extracting the global context of remote sensing images, which is vital for semantic segmentation, due to the locality of the convolution operation. It is informed that the recently developed Transformer is equipped with powerful global modeling capabilities. A network called TCNet is proposed in this article, and a parallel-in-branch architecture of the Transformer and the CNN is adopted in the TCNet. As such, the TCNet takes advantage of both Transformer and CNN, and both global context and low-level spatial details could be captured in a much shallower manner. In addition, a novel fusion technique called Interactive Self-attention is advanced to fuse the multilevel features extracted from both branches. To bridge the semantic gap between regions, a skip connection module called Windowed Self-attention Gating is further developed and added to the progressive upsampling network. Experiments on three public datasets (i.e., Bijie Landslide Dataset, WHU Building Dataset, and Massachusetts Buildings Dataset) depict that TCNet yields superior performance over state-of-the-art models. The IoU values obtained by TCNet for these three datasets are 75.34% (ranked first among 10 models compared), 91.16% (ranked first among 13 models compared), and 76.21% (ranked first among 13 models compared), respectively.
引用
收藏
页码:3123 / 3136
页数:14
相关论文
共 72 条
  • [11] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Chen, Liang-Chieh
    Zhu, Yukun
    Papandreou, George
    Schroff, Florian
    Adam, Hartwig
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
  • [12] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [13] UAV photogrammetry-based remote sensing and preliminary assessment of the behavior of a landslide in Guizhou, China
    Cheng, Zhan
    Gong, Wenping
    Tang, Huiming
    Juang, C. Hsein
    Deng, Qinglu
    Chen, Jun
    Ye, Xiongfei
    [J]. ENGINEERING GEOLOGY, 2021, 289
  • [14] Deng-Ping Fan, 2020, Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. 23rd International Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12266), P263, DOI 10.1007/978-3-030-59725-2_26
  • [15] Hybrid CNN-LSTM and IoT-based coal mine hazards monitoring and prediction system
    Dey, Prasanjit
    Chaulya, S. K.
    Kumar, Sanjay
    [J]. PROCESS SAFETY AND ENVIRONMENTAL PROTECTION, 2021, 152 : 249 - 263
  • [16] LANet: Local Attention Embedding to Improve the Semantic Segmentation of Remote Sensing Images
    Ding, Lei
    Tang, Hao
    Bruzzone, Lorenzo
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (01): : 426 - 435
  • [17] Dosovitskiy A., 2021, P 9 INT C LEARN REPR, P1, DOI DOI 10.48550/ARXIV
  • [18] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture
    Eigen, David
    Fergus, Rob
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2650 - 2658
  • [19] Dual Attention Network for Scene Segmentation
    Fu, Jun
    Liu, Jing
    Tian, Haijie
    Li, Yong
    Bao, Yongjun
    Fang, Zhiwei
    Lu, Hanqing
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3141 - 3149
  • [20] Recognition and Mapping of Landslide Using a Fully Convolutional DenseNet and Influencing Factors
    Gao, Xiao
    Chen, Tao
    Niu, Ruiqing
    Plaza, Antonio
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 7881 - 7894