MCAT-UNet: Convolutional and Cross-Shaped Window Attention Enhanced UNet for Efficient High-Resolution Remote Sensing Image Segmentation

被引:6
|
作者
Wang, Tao [1 ,2 ,3 ]
Xu, Chao [1 ]
Liu, Bin [1 ]
Yang, Guang [1 ]
Zhang, Erlei [1 ]
Niu, Dangdang [1 ]
Zhang, Hongming [1 ]
机构
[1] Northwest A&F Univ, Coll Informat Engn, Yangling 712100, Peoples R China
[2] Tarim Univ, Coll Informat Engn, Alaer 843300, Peoples R China
[3] Tarim Univ, Key Lab Tarim Oasis Agr, Minist Educ, Alaer 843300, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Remote sensing; Feature extraction; Semantics; Task analysis; Semantic segmentation; Computer vision; Convolutional attention; cross-shaped self-attention; remote sensing image; semantic segmentation; transformer; SEMANTIC SEGMENTATION;
D O I
10.1109/JSTARS.2024.3397488
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Semantic segmentation is a crucial step in the intelligent interpretation of high-resolution remote sensing images (HRSIs). Convolutional neural networks and transformers are widely used for semantic feature extraction in remote sensing images, but the former inevitably has limitations in modeling long-range spatial dependency information, while the latter lacks the ability to learn local semantic features. Existing remote sensing image segmentation methods are optimized and modified based on the backbone networks used in natural image processing. Despite achieving relatively good results, the complexity of their network structures leads to high computational costs and limited improvements in accuracy. These methods have limited boundary distinction for ground objects in complex environments, especially for small targets. In this article, we propose an efficient semantic segmentation architecture for HRSIs called MCAT-UNet, which utilizes multiscale convolutional attention (MSCA) and the cross-shaped window transformer (CSWT) to reconstruct UNet. The encoder stacks a sequence of MSCA to exploit the advantages of convolution attention to encode context information more effectively and enhance hierarchical multiscale representation learning. The proposed U-shaped decoder integrates three skip connections using the CSWT block to further capture long-range spatial dependency and gradually restore the size of the feature map. We benchmark MCAT-UNet on three common datasets, Potsdam, Vaihingen, and LoveDA. Comprehensive experiments and extensive ablation studies show that our proposed MCAT-UNet outperforms previous state-of-the-art methods with remarkable performance.
引用
收藏
页码:9745 / 9758
页数:14
相关论文
共 50 条
  • [41] Fuzzy neighbourhood neural network for high-resolution remote sensing image segmentation
    Qu, Tingting
    Xu, Jindong
    Chong, Qianpeng
    Liu, Zhaowei
    Yan, Weiqing
    Wang, Xuan
    Song, Yongchao
    Ni, Mengying
    EUROPEAN JOURNAL OF REMOTE SENSING, 2023, 56 (01)
  • [42] Feature Enhancement Attention for Road Extraction in High-Resolution Remote Sensing Image
    Yu, Hang
    Li, Chenyang
    Guo, Yuru
    Zhou, Suiping
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 19805 - 19816
  • [43] UNeXt: An Efficient Network for the Semantic Segmentation of High-Resolution Remote Sensing Images
    Chang, Zhanyuan
    Xu, Mingyu
    Wei, Yuwen
    Lian, Jie
    Zhang, Chongming
    Li, Chuanjiang
    SENSORS, 2024, 24 (20)
  • [44] A central-point-enhanced convolutional neural network for high-resolution remote-sensing image classification
    Pan, Xin
    Zhao, Jian
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2017, 38 (23) : 6554 - 6581
  • [45] AANet: Adaptive Attention Networks for Semantic Segmentation of High-Resolution Remote Sensing Imagery
    Chen, Yan
    Zhang, Qianchuan
    Wang, Xiaofeng
    Dong, Quan
    Kang, Menglei
    Jiang, Wenxiang
    Wang, Mengyuan
    Xu, Lixiang
    Zhang, Chen
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 14640 - 14655
  • [46] Enhanced shuffle attention network based on visual working mechanism for high-resolution remote sensing image classification
    Cong, Ming
    Cui, Jianjun
    Chen, Siliang
    Wang, Yihui
    Han, Ling
    Xi, Jiangbo
    Gu, Junkai
    Zhang, Qingfang
    Tao, Yiting
    Wang, Zhiye
    Xu, Miaozhong
    Deng, Hong
    GEOCARTO INTERNATIONAL, 2022, 37 (27) : 18731 - 18766
  • [47] Change Detection for High-resolution Remote Sensing Images Based on a UNet-like Siamese-structured Transformer Network
    Liang, Chen
    Chen, Pinxiang
    Liu, Huiping
    Zhu, Xiaokun
    Geng, Yuanhao
    Zhang, Zhenwei
    SENSORS AND MATERIALS, 2023, 35 (01) : 183 - 198
  • [48] Open-Pit Mine Road Extraction From High-Resolution Remote Sensing Images Using RATT-UNet
    Xiao, Dong
    Yin, Lingyu
    Fu, Yanhua
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [49] EFCNet: Ensemble Full Convolutional Network for Semantic Segmentation of High-Resolution Remote Sensing Images
    Chen, Li
    Dou, Xin
    Peng, Jian
    Li, Wenbo
    Sun, Bingyu
    Li, Haifeng
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [50] Fully convolutional DenseNet with adversarial training for semantic segmentation of high-resolution remote sensing images
    Guo, Xuejun
    Chen, Zehua
    Wang, Chengyi
    JOURNAL OF APPLIED REMOTE SENSING, 2021, 15 (01)