A shape-aware enhancement Vision Transformer for building extraction from remote sensing imagery

被引:1
作者
Yiming, Tuerhong [1 ]
Tang, Xiaoyan [1 ,2 ]
Shang, Haibin [1 ]
机构
[1] Xinjiang Univ, Coll Civil Engn & Architecture, Urumqi, Xinjiang, Peoples R China
[2] Xinjiang Univ, Coll Civil Engn & Architecture, Urumqi 830000, Xinjiang, Peoples R China
关键词
Deep learning; building extraction; Vision Transformer; long-range independence; shape feature enhancement; FOOTPRINT EXTRACTION; SEGMENTATION; NETWORK;
D O I
10.1080/01431161.2024.2307325
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
Convolutional neural networks (CNN) have been developed for several years in the field of extracting buildings from remote sensing images. Vision Transformer (ViT) has recently demonstrated superior performance over CNN, thanks to its ability to model long-range dependencies through self-attention mechanisms. However, most existing ViT models lack shape information enhancement for the building objects, resulting in insufficient fine-grained segmentation. To address this limitation, we construct an efficient dual-path ViT framework for building segmentation, termed shape-aware enhancement Vision Transformer (SAEViT). Our approach incorporates shape-aware enhancement module (SAEM) that perceives and enhances the shape features of buildings using multi-shapes of convolutional kernels. We also introduce multi-pooling channel attention (MPCA) to exploit channel-wise information without squeezing the channel dimension. Furthermore, we propose a progressive aggregation upsampling model (PAUM) in the decoder to aggregate multilevel features using a progressive upsampling methodology, coupled with the utilization of the soft-pool algorithm operating on the channel axis. We evaluate our model on three public building datasets. The experimental results show that SAEViT obtains a significant improvement on various datasets, confirming its efficacy. Compared with several state-of-the-art models, SAEViT achieves a comprehensive transcendence in overall performance.
引用
收藏
页码:1250 / 1276
页数:27
相关论文
共 78 条
  • [1] DEVELOPING A FRAMEWORK FOR RAPID COLLAPSED BUILDING MAPPING USING SATELLITE IMAGERY AND DEEP LEARNING MODELS
    Adriano, Bruno
    Miura, Hiroyuki
    Liu, Wen
    Matsuoka, Masashi
    Koshimura, Shunichi
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 1273 - 1276
  • [2] Deep learning-based building height mapping using Sentinel-1 and Sentinel-2 data
    Cai, Bowen
    Shao, Zhenfeng
    Huang, Xiao
    Zhou, Xuechao
    Fang, Shenghui
    [J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2023, 122
  • [3] MHA-Net: Multipath Hybrid Attention Network for Building Footprint Extraction From High-Resolution Remote Sensing Imagery
    Cai, Jihong
    Chen, Yimin
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 5807 - 5817
  • [4] A Context Feature Enhancement Network for Building Extraction from High-Resolution Remote Sensing Imagery
    Chen, Jinzhi
    Zhang, Dejun
    Wu, Yiqi
    Chen, Yilin
    Yan, Xiaohu
    [J]. REMOTE SENSING, 2022, 14 (09)
  • [5] ASF-Net: Adaptive Screening Feature Network for Building Footprint Extraction From Remote-Sensing Images
    Chen, Jun
    Jiang, Yuxuan
    Luo, Linbo
    Gong, Wenping
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [6] Building Extraction from Remote Sensing Images with Sparse Token Transformers
    Chen, Keyan
    Zou, Zhengxia
    Shi, Zhenwei
    [J]. REMOTE SENSING, 2021, 13 (21)
  • [7] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Chen, Liang-Chieh
    Zhu, Yukun
    Papandreou, George
    Schroff, Florian
    Adam, Hartwig
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
  • [8] Multiscale Feature Learning by Transformer for Building Extraction From Satellite Images
    Chen, Xin
    Qiu, Chunping
    Guo, Wenyue
    Yu, Anzhu
    Tong, Xiaochong
    Schmitt, Michael
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [9] Deformable Convolutional Networks
    Dai, Jifeng
    Qi, Haozhi
    Xiong, Yuwen
    Li, Yi
    Zhang, Guodong
    Hu, Han
    Wei, Yichen
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 764 - 773
  • [10] DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images
    Demir, Ilke
    Koperski, Krzysztof
    Lindenbaum, David
    Pang, Guan
    Huang, Jing
    Bast, Saikat
    Hughes, Forest
    Tuia, Devis
    Raskar, Ramesh
    [J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 172 - 181