MCAT-UNet: Convolutional and Cross-Shaped Window Attention Enhanced UNet for Efficient High-Resolution Remote Sensing Image Segmentation

被引:6
|
作者
Wang, Tao [1 ,2 ,3 ]
Xu, Chao [1 ]
Liu, Bin [1 ]
Yang, Guang [1 ]
Zhang, Erlei [1 ]
Niu, Dangdang [1 ]
Zhang, Hongming [1 ]
机构
[1] Northwest A&F Univ, Coll Informat Engn, Yangling 712100, Peoples R China
[2] Tarim Univ, Coll Informat Engn, Alaer 843300, Peoples R China
[3] Tarim Univ, Key Lab Tarim Oasis Agr, Minist Educ, Alaer 843300, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Remote sensing; Feature extraction; Semantics; Task analysis; Semantic segmentation; Computer vision; Convolutional attention; cross-shaped self-attention; remote sensing image; semantic segmentation; transformer; SEMANTIC SEGMENTATION;
D O I
10.1109/JSTARS.2024.3397488
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Semantic segmentation is a crucial step in the intelligent interpretation of high-resolution remote sensing images (HRSIs). Convolutional neural networks and transformers are widely used for semantic feature extraction in remote sensing images, but the former inevitably has limitations in modeling long-range spatial dependency information, while the latter lacks the ability to learn local semantic features. Existing remote sensing image segmentation methods are optimized and modified based on the backbone networks used in natural image processing. Despite achieving relatively good results, the complexity of their network structures leads to high computational costs and limited improvements in accuracy. These methods have limited boundary distinction for ground objects in complex environments, especially for small targets. In this article, we propose an efficient semantic segmentation architecture for HRSIs called MCAT-UNet, which utilizes multiscale convolutional attention (MSCA) and the cross-shaped window transformer (CSWT) to reconstruct UNet. The encoder stacks a sequence of MSCA to exploit the advantages of convolution attention to encode context information more effectively and enhance hierarchical multiscale representation learning. The proposed U-shaped decoder integrates three skip connections using the CSWT block to further capture long-range spatial dependency and gradually restore the size of the feature map. We benchmark MCAT-UNet on three common datasets, Potsdam, Vaihingen, and LoveDA. Comprehensive experiments and extensive ablation studies show that our proposed MCAT-UNet outperforms previous state-of-the-art methods with remarkable performance.
引用
收藏
页码:9745 / 9758
页数:14
相关论文
共 50 条
  • [31] High-Resolution Boundary-Constrained and Context-Enhanced Network for Remote Sensing Image Segmentation
    Xu, Yizhe
    Jiang, Jie
    REMOTE SENSING, 2022, 14 (08)
  • [32] High-Resolution Remote Sensing Image Change Detection Based on Cross-Mixing Attention Network
    Wu, Chaoyang
    Yang, Le
    Guo, Cunge
    Wu, Xiaosuo
    ELECTRONICS, 2024, 13 (03)
  • [33] A feature enhancement network combining UNet and vision transformer for building change detection in high-resolution remote sensing images
    Yu Sun
    Yujuan Zhao
    Xianwei Han
    Wei Gao
    Yunliang Hu
    Yimin Zhang
    Neural Computing and Applications, 2025, 37 (3) : 1429 - 1456
  • [34] High-Resolution Remote Sensing Image Semantic Segmentation via Multiscale Context and Linear Self-Attention
    Yin, Peng
    Zhang, Dongmei
    Han, Wei
    Li, Jiang
    Cheng, Jianmei
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 9174 - 9185
  • [35] High-Resolution Remote Sensing Image Segmentation Algorithm Based on Improved Feature Extraction and Hybrid Attention Mechanism
    Huang, Min
    Dai, Wenhui
    Yan, Weihao
    Wang, Jingyang
    ELECTRONICS, 2023, 12 (17)
  • [36] Urban Vegetation Extraction from High-Resolution Remote Sensing Imagery on SD-UNet and Vegetation Spectral Features
    Lin, Na
    Quan, Hailin
    He, Jing
    Li, Shuangtao
    Xiao, Maochi
    Wang, Bin
    Chen, Tao
    Dai, Xiaoai
    Pan, Jianping
    Li, Nanjie
    REMOTE SENSING, 2023, 15 (18)
  • [37] An Efficient File Reading Platform for High-resolution Remote Sensing Image
    Zhang, Libao
    Yang, Kaina
    Yu, Xianchuan
    SATELLITE DATA COMPRESSION, COMMUNICATIONS, AND PROCESSING VIII, 2012, 8514
  • [38] AGs-Unet: Building Extraction Model for High Resolution Remote Sensing Images Based on Attention Gates U Network
    Yu, Mingyang
    Chen, Xiaoxian
    Zhang, Wenzhuo
    Liu, Yaohui
    SENSORS, 2022, 22 (08)
  • [39] Ensemble of features for efficient classification of high-resolution remote sensing image
    Nisia, Gladima T.
    Rajesh, S.
    EUROPEAN JOURNAL OF REMOTE SENSING, 2022, 55 (01) : 326 - 337
  • [40] SEGMENTATION METHOD OF HIGH-RESOLUTION REMOTE SENSING IMAGE FOR FAST TARGET RECOGNITION
    Li, Chenming
    Gao, Hongmin
    Yang, Yao
    Qu, Xiaoyu
    Yuan, Wenjing
    INTERNATIONAL JOURNAL OF ROBOTICS & AUTOMATION, 2019, 34 (03): : 216 - 224