MCAT-UNet: Convolutional and Cross-Shaped Window Attention Enhanced UNet for Efficient High-Resolution Remote Sensing Image Segmentation

被引:6
|
作者
Wang, Tao [1 ,2 ,3 ]
Xu, Chao [1 ]
Liu, Bin [1 ]
Yang, Guang [1 ]
Zhang, Erlei [1 ]
Niu, Dangdang [1 ]
Zhang, Hongming [1 ]
机构
[1] Northwest A&F Univ, Coll Informat Engn, Yangling 712100, Peoples R China
[2] Tarim Univ, Coll Informat Engn, Alaer 843300, Peoples R China
[3] Tarim Univ, Key Lab Tarim Oasis Agr, Minist Educ, Alaer 843300, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Remote sensing; Feature extraction; Semantics; Task analysis; Semantic segmentation; Computer vision; Convolutional attention; cross-shaped self-attention; remote sensing image; semantic segmentation; transformer; SEMANTIC SEGMENTATION;
D O I
10.1109/JSTARS.2024.3397488
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Semantic segmentation is a crucial step in the intelligent interpretation of high-resolution remote sensing images (HRSIs). Convolutional neural networks and transformers are widely used for semantic feature extraction in remote sensing images, but the former inevitably has limitations in modeling long-range spatial dependency information, while the latter lacks the ability to learn local semantic features. Existing remote sensing image segmentation methods are optimized and modified based on the backbone networks used in natural image processing. Despite achieving relatively good results, the complexity of their network structures leads to high computational costs and limited improvements in accuracy. These methods have limited boundary distinction for ground objects in complex environments, especially for small targets. In this article, we propose an efficient semantic segmentation architecture for HRSIs called MCAT-UNet, which utilizes multiscale convolutional attention (MSCA) and the cross-shaped window transformer (CSWT) to reconstruct UNet. The encoder stacks a sequence of MSCA to exploit the advantages of convolution attention to encode context information more effectively and enhance hierarchical multiscale representation learning. The proposed U-shaped decoder integrates three skip connections using the CSWT block to further capture long-range spatial dependency and gradually restore the size of the feature map. We benchmark MCAT-UNet on three common datasets, Potsdam, Vaihingen, and LoveDA. Comprehensive experiments and extensive ablation studies show that our proposed MCAT-UNet outperforms previous state-of-the-art methods with remarkable performance.
引用
收藏
页码:9745 / 9758
页数:14
相关论文
共 50 条
  • [1] CSWin-UNet: Transformer UNet with cross-shaped windows for medical image segmentation
    Liu, Xiao
    Gao, Peng
    Yu, Tao
    Wang, Fei
    Yuan, Ru-Yue
    INFORMATION FUSION, 2025, 113
  • [2] UNetMamba: An Efficient UNet-Like Mamba for Semantic Segmentation of High-Resolution Remote Sensing Images
    Zhu, Enze
    Chen, Zhan
    Wang, Dingkai
    Shi, Hanru
    Liu, Xiaoxuan
    Wang, Lei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2025, 22
  • [3] High-resolution remote sensing images semantic segmentation using improved UNet and SegNet
    Wang, Xin
    Jing, Shihan
    Dai, Huifeng
    Shi, Aiye
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 108
  • [4] T-UNet: triplet UNet for change detection in high-resolution remote sensing images
    Zhong, Huan
    Wu, Chen
    GEO-SPATIAL INFORMATION SCIENCE, 2024,
  • [5] Object-Based Semi-Supervised Spatial Attention Residual UNet for Urban High-Resolution Remote Sensing Image Classification
    Lu, Yuanbing
    Li, Huapeng
    Zhang, Ce
    Zhang, Shuqing
    REMOTE SENSING, 2024, 16 (08)
  • [6] Res50-SimAM-ASPP-Unet: A Semantic Segmentation Model for High-Resolution Remote Sensing Images
    Cai, Jiajing
    Shi, Jinmei
    Leau, Yu-Beng
    Meng, Shangyu
    Zheng, Xiuyan
    Zhou, Jinghe
    IEEE ACCESS, 2024, 12 : 192301 - 192316
  • [7] SER-UNet algorithm for building extraction from high-resolution remote sensing image combined with multipath
    Hu M.
    Li J.
    Yao Y.
    Xiaohui A.
    Lu M.
    Li W.
    Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2023, 52 (05): : 808 - 817
  • [8] EUNet-CD: Efficient UNet plus plus for Change Detection of Very High-Resolution Remote Sensing Images
    Raza, Asif
    Huo, Hong
    Fang, Tao
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [9] 2SWUNet: small window SWin UNet based on tansformer for building extraction from high-resolution remote sensing images
    YU Jiamin
    CHAN Sixian
    LEI Yanjing
    WU Wei
    WANG Yuan
    ZHOU Xiaolong
    Optoelectronics Letters, 2024, 20 (10) : 599 - 606
  • [10] Road Segmentation Based on Hybrid Convolutional Network for High-Resolution Visible Remote Sensing Image
    Li, Ye
    Guo, Lili
    Rao, Jun
    Xu, Lele
    Jin, Shan
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2019, 16 (04) : 613 - 617