Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

被引:196
作者
Zhang, Cheng [1 ]
Jiang, Wanshou [1 ,2 ]
Zhang, Yuan [1 ]
Wang, Wei [1 ]
Zhao, Qing [1 ]
Wang, Chenjie [1 ]
机构
[1] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & R, Wuhan 430079, Peoples R China
[2] Collaborat Innovat Ctr Geospatial Technol, Wuhan 430079, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2022年 / 60卷
关键词
Transformers; Semantics; Image segmentation; Feature extraction; Remote sensing; Decoding; Convolutional neural networks; Boundary detection; semantic segmentation; squeeze-and-excitation (SE) block; Swin transformer; very high resolution (VHR) remote sensing imagery;
D O I
10.1109/TGRS.2022.3144894
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
This article presents a transformer and convolutional neural network (CNN) hybrid deep neural network for semantic segmentation of very high resolution (VHR) remote sensing imagery. The model follows an encoder-decoder structure. The encoder module uses a new universal backbone Swin transformer to extract features to achieve better long-range spatial dependencies modeling. The decoder module draws on some effective blocks and successful strategies of CNN-based models in remote sensing image segmentation. In the middle of the framework, an atrous spatial pyramid pooling block based on depthwise separable convolution (SASPP) is applied to obtain a multiscale context. A U-shaped decoder is used to gradually restore the size of the feature maps. Three skip connections are built between the encoder and decoder feature maps of the same size to maintain the transmission of local details and enhance the communication of multiscale features. A squeeze-and-excitation (SE) channel attention block is added before segmentation for feature augmentation. An auxiliary boundary detection branch is combined to provide edge constraints for semantic segmentation. Extensive ablation experiments were conducted on the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen and Potsdam benchmarks to test the effectiveness of multiple components of the network. At the same time, the proposed method is compared with the current state-of-the-art methods on the two benchmarks. The proposed hybrid network achieved the second highest overall accuracy (OA) on both the Potsdam and Vaihingen benchmarks (code and models are available at https://github.com/zq7734509/mmsegmentation- multilayer).
引用
收藏
页数:20
相关论文
共 53 条
[1]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[2]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[3]   Boundary Loss for Remote Sensing Imagery Semantic Segmentation [J].
Bokhovkin, Alexey ;
Burnaev, Evgeny .
ADVANCES IN NEURAL NETWORKS - ISNN 2019, PT II, 2019, 11555 :388-401
[4]  
Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[5]  
Chen J, ARXIV210204306, V2021
[6]   CaMap: Camera-based Map Manipulation on Mobile Devices [J].
Chen, Liang ;
Chen, Dongyi .
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,
[7]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[8]   Aerial imagery for roof segmentation: A large-scale dataset towards automatic mapping of buildings (Withdrawn Publication) [J].
Chen, Qi ;
Wang, Lei ;
Wu, Yifan ;
Wu, Guangming ;
Guo, Zhiling ;
Waslander, Steven L. .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2019, 147 :42-55
[9]   ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data [J].
Diakogiannis, Foivos, I ;
Waldner, Francois ;
Caccetta, Peter ;
Wu, Chen .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 162 (162) :94-114
[10]   Semantic Segmentation of Large-Size VHR Remote Sensing Images Using a Two-Stage Multiscale Training Architecture [J].
Ding, Lei ;
Zhang, Jing ;
Bruzzone, Lorenzo .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (08) :5367-5376