MCAT-UNet: Convolutional and Cross-Shaped Window Attention Enhanced UNet for Efficient High-Resolution Remote Sensing Image Segmentation

被引:6
|
作者
Wang, Tao [1 ,2 ,3 ]
Xu, Chao [1 ]
Liu, Bin [1 ]
Yang, Guang [1 ]
Zhang, Erlei [1 ]
Niu, Dangdang [1 ]
Zhang, Hongming [1 ]
机构
[1] Northwest A&F Univ, Coll Informat Engn, Yangling 712100, Peoples R China
[2] Tarim Univ, Coll Informat Engn, Alaer 843300, Peoples R China
[3] Tarim Univ, Key Lab Tarim Oasis Agr, Minist Educ, Alaer 843300, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Remote sensing; Feature extraction; Semantics; Task analysis; Semantic segmentation; Computer vision; Convolutional attention; cross-shaped self-attention; remote sensing image; semantic segmentation; transformer; SEMANTIC SEGMENTATION;
D O I
10.1109/JSTARS.2024.3397488
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Semantic segmentation is a crucial step in the intelligent interpretation of high-resolution remote sensing images (HRSIs). Convolutional neural networks and transformers are widely used for semantic feature extraction in remote sensing images, but the former inevitably has limitations in modeling long-range spatial dependency information, while the latter lacks the ability to learn local semantic features. Existing remote sensing image segmentation methods are optimized and modified based on the backbone networks used in natural image processing. Despite achieving relatively good results, the complexity of their network structures leads to high computational costs and limited improvements in accuracy. These methods have limited boundary distinction for ground objects in complex environments, especially for small targets. In this article, we propose an efficient semantic segmentation architecture for HRSIs called MCAT-UNet, which utilizes multiscale convolutional attention (MSCA) and the cross-shaped window transformer (CSWT) to reconstruct UNet. The encoder stacks a sequence of MSCA to exploit the advantages of convolution attention to encode context information more effectively and enhance hierarchical multiscale representation learning. The proposed U-shaped decoder integrates three skip connections using the CSWT block to further capture long-range spatial dependency and gradually restore the size of the feature map. We benchmark MCAT-UNet on three common datasets, Potsdam, Vaihingen, and LoveDA. Comprehensive experiments and extensive ablation studies show that our proposed MCAT-UNet outperforms previous state-of-the-art methods with remarkable performance.
引用
收藏
页码:9745 / 9758
页数:14
相关论文
共 50 条
  • [21] Attention-Enhanced Urban Fugitive Dust Source Segmentation in High-Resolution Remote Sensing Images
    He, Xiaoqing
    Wang, Zhibao
    Bai, Lu
    Fan, Meng
    Chen, Yuanlin
    Chen, Liangfu
    REMOTE SENSING, 2024, 16 (20)
  • [22] An ensemble architecture of deep convolutional Segnet and Unet networks for building semantic segmentation from high-resolution aerial images
    Abdollahi, Abolfazl
    Pradhan, Biswajeet
    Alamri, Abdullah M.
    GEOCARTO INTERNATIONAL, 2022, 37 (12) : 3355 - 3370
  • [23] Selection of the Optimal Segmentation Scale in High-resolution Remote Sensing Image
    Cheng, Yi-xian
    Mao, Feng
    2018 2ND INTERNATIONAL CONFERENCE ON APPLIED MATHEMATICS, MODELING AND SIMULATION (AMMS 2018), 2018, 305 : 107 - 112
  • [24] HIGH-RESOLUTION REMOTE SENSING IMAGE SEGMENTATION METHOD BASED ON SReLU
    Li, Chenming
    Qu, Xiaoyu
    Yang, Yao
    Gao, Hongmin
    Wang, Yongchang
    Yao, Dan
    Yuan, Wenjing
    INTERNATIONAL JOURNAL OF ROBOTICS & AUTOMATION, 2019, 34 (03): : 225 - 234
  • [25] Pixel Representation Augmented through Cross-Attention for High-Resolution Remote Sensing Imagery Segmentation
    Luo, Yiyun
    Wang, Jinnian
    Yang, Xiankun
    Yu, Zhenyu
    Tan, Zixuan
    REMOTE SENSING, 2022, 14 (21)
  • [26] High-Resolution Remote Sensing Image Captioning Based on Structured Attention
    Zhao, Rui
    Shi, Zhenwei
    Zou, Zhengxia
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [27] A Deformable Attention Network for High-Resolution Remote Sensing Images Semantic Segmentation
    Zuo, Renxiang
    Zhang, Guangyun
    Zhang, Rongting
    Jia, Xiuping
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [28] High-Resolution Semantic Segmentation of Woodland Fires Using Residual Attention UNet and Time Series of Sentinel-2
    Shirvani, Zeinab
    Abdi, Omid
    Goodman, Rosa C.
    REMOTE SENSING, 2023, 15 (05)
  • [29] DW-SCA Unet: medical image segmentation based on depth-wise separable convolutional attention U-shaped network
    Zhou, Yi
    Tian, Wei
    Zhang, Yichi
    Wang, Chuzheng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 8893 - 8910
  • [30] DW-SCA Unet: medical image segmentation based on depth-wise separable convolutional attention U-shaped network
    Yi Zhou
    Wei Tian
    Yichi Zhang
    Chuzheng Wang
    Multimedia Tools and Applications, 2024, 83 : 8893 - 8910