A shape-aware enhancement Vision Transformer for building extraction from remote sensing imagery

被引：1

作者：

Yiming, Tuerhong ^{[1
]}

Tang, Xiaoyan ^{[1
,2
]}

Shang, Haibin ^{[1
]}

机构：

[1] Xinjiang Univ, Coll Civil Engn & Architecture, Urumqi, Xinjiang, Peoples R China

[2] Xinjiang Univ, Coll Civil Engn & Architecture, Urumqi 830000, Xinjiang, Peoples R China

来源：

INTERNATIONAL JOURNAL OF REMOTE SENSING | 2024年 / 45卷 / 04期

关键词：

Deep learning; building extraction; Vision Transformer; long-range independence; shape feature enhancement; FOOTPRINT EXTRACTION; SEGMENTATION; NETWORK;

D O I：

10.1080/01431161.2024.2307325

中图分类号：

TP7 [遥感技术];

学科分类号：

081102 ; 0816 ; 081602 ; 083002 ; 1404 ;

摘要：

Convolutional neural networks (CNN) have been developed for several years in the field of extracting buildings from remote sensing images. Vision Transformer (ViT) has recently demonstrated superior performance over CNN, thanks to its ability to model long-range dependencies through self-attention mechanisms. However, most existing ViT models lack shape information enhancement for the building objects, resulting in insufficient fine-grained segmentation. To address this limitation, we construct an efficient dual-path ViT framework for building segmentation, termed shape-aware enhancement Vision Transformer (SAEViT). Our approach incorporates shape-aware enhancement module (SAEM) that perceives and enhances the shape features of buildings using multi-shapes of convolutional kernels. We also introduce multi-pooling channel attention (MPCA) to exploit channel-wise information without squeezing the channel dimension. Furthermore, we propose a progressive aggregation upsampling model (PAUM) in the decoder to aggregate multilevel features using a progressive upsampling methodology, coupled with the utilization of the soft-pool algorithm operating on the channel axis. We evaluate our model on three public building datasets. The experimental results show that SAEViT obtains a significant improvement on various datasets, confirming its efficacy. Compared with several state-of-the-art models, SAEViT achieves a comprehensive transcendence in overall performance.

引用

页码：1250 / 1276

页数：27

共 78 条

[1] DEVELOPING A FRAMEWORK FOR RAPID COLLAPSED BUILDING MAPPING USING SATELLITE IMAGERY AND DEEP LEARNING MODELS
Adriano, Bruno
Miura, Hiroyuki
Liu, Wen
Matsuoka, Masashi
Koshimura, Shunichi
[J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 1273 - 1276
[2] Deep learning-based building height mapping using Sentinel-1 and Sentinel-2 data
Cai, Bowen
Shao, Zhenfeng
Huang, Xiao
Zhou, Xuechao
Fang, Shenghui
[J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2023, 122
[3] MHA-Net: Multipath Hybrid Attention Network for Building Footprint Extraction From High-Resolution Remote Sensing Imagery
Cai, Jihong
Chen, Yimin
[J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 5807 - 5817
[4] A Context Feature Enhancement Network for Building Extraction from High-Resolution Remote Sensing Imagery
Chen, Jinzhi
Zhang, Dejun
Wu, Yiqi
Chen, Yilin
Yan, Xiaohu
[J]. REMOTE SENSING, 2022, 14 (09)
[5] ASF-Net: Adaptive Screening Feature Network for Building Footprint Extraction From Remote-Sensing Images
Chen, Jun
Jiang, Yuxuan
Luo, Linbo
Gong, Wenping
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[6] Building Extraction from Remote Sensing Images with Sparse Token Transformers
Chen, Keyan
Zou, Zhengxia
Shi, Zhenwei
[J]. REMOTE SENSING, 2021, 13 (21)
[7] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
Chen, Liang-Chieh
Zhu, Yukun
Papandreou, George
Schroff, Florian
Adam, Hartwig
[J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
[8] Multiscale Feature Learning by Transformer for Building Extraction From Satellite Images
Chen, Xin
Qiu, Chunping
Guo, Wenyue
Yu, Anzhu
Tong, Xiaochong
Schmitt, Michael
[J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[9] Deformable Convolutional Networks
Dai, Jifeng
Qi, Haozhi
Xiong, Yuwen
Li, Yi
Zhang, Guodong
Hu, Han
Wei, Yichen
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 764 - 773
[10] DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images
Demir, Ilke
Koperski, Krzysztof
Lindenbaum, David
Pang, Guan
Huang, Jing
Bast, Saikat
Hughes, Forest
Tuia, Devis
Raskar, Ramesh
[J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 172 - 181

← 1 2 3 4 5 6 7 8 →