Asymmetric Network Combining CNN and Transformer for Building Extraction from Remote Sensing Images

被引:6
作者
Chang, Junhao [1 ]
Cen, Yuefeng [1 ]
Cen, Gang [1 ]
机构
[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China
基金
中国国家自然科学基金;
关键词
building extraction; bilateral hybrid attention; convolutional neural network; remote sensing; semantic segmentation; transformer;
D O I
10.3390/s24196198
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The accurate extraction of buildings from remote sensing images is crucial in fields such as 3D urban planning, disaster detection, and military reconnaissance. In recent years, models based on Transformer have performed well in global information processing and contextual relationship modeling, but suffer from high computational costs and insufficient ability to capture local information. In contrast, convolutional neural networks (CNNs) are very effective in extracting local features, but have a limited ability to process global information. In this paper, an asymmetric network (CTANet), which combines the advantages of CNN and Transformer, is proposed to achieve efficient extraction of buildings. Specifically, CTANet employs ConvNeXt as an encoder to extract features and combines it with an efficient bilateral hybrid attention transformer (BHAFormer) which is designed as a decoder. The BHAFormer establishes global dependencies from both texture edge features and background information perspectives to extract buildings more accurately while maintaining a low computational cost. Additionally, the multiscale mixed attention mechanism module (MSM-AMM) is introduced to learn the multiscale semantic information and channel representations of the encoder features to reduce noise interference and compensate for the loss of information in the downsampling process. Experimental results show that the proposed model achieves the best F1-score (86.7%, 95.74%, and 90.52%) and IoU (76.52%, 91.84%, and 82.68%) compared to other state-of-the-art methods on the Massachusetts building dataset, the WHU building dataset, and the Inria aerial image labeling dataset.
引用
收藏
页数:23
相关论文
共 46 条
[1]  
Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
[2]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[3]  
Chen J., 2021, arXiv
[4]   Building Extraction from Remote Sensing Images with Sparse Token Transformers [J].
Chen, Keyan ;
Zou, Zhengxia ;
Shi, Zhenwei .
REMOTE SENSING, 2021, 13 (21)
[5]  
Chen LC, 2017, Arxiv, DOI [arXiv:1706.05587, 10.48550/arXiv.1706.05587]
[6]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[7]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[8]   Multi-Scale Boosted Dehazing Network with Dense Feature Fusion [J].
Dong, Hang ;
Pan, Jinshan ;
Xiang, Lei ;
Hu, Zhe ;
Zhang, Xinyi ;
Wang, Fei ;
Yang, Ming-Hsuan .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2154-2164
[9]   A comprehensive review of earthquake-induced building damage detection with remote sensing techniques [J].
Dong, Laigen ;
Shan, Jie .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2013, 84 :85-99
[10]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929