ResU-Former: Advancing Remote Sensing Image Segmentation with Swin Residual Transformer for Precise Global-Local Feature Recognition and Visual-Semantic Space Learning

被引:3
作者
Li, Hanlu [1 ]
Li, Lei [2 ]
Zhao, Liangyu [1 ]
Liu, Fuxiang [1 ]
机构
[1] Beijing Inst Technol, Minist Educ, Key Lab Dynam & Control Flight Vehicle, Beijing 100081, Peoples R China
[2] Aerosp Tianmu Chongqing Satellite Sci & Technol Co, Chongqing 400000, Peoples R China
关键词
semantic segmentation; transformer; balance between visual and semantic space; enhancement of both global and local aspects;
D O I
10.3390/electronics13020436
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the field of remote sensing image segmentation, achieving high accuracy and efficiency in diverse and complex environments remains a challenge. Additionally, there is a notable imbalance between the underlying features and the high-level semantic information embedded within remote sensing images, and both global and local recognition improvements are also limited by the multi-scale remote sensing scenery and imbalanced class distribution. These challenges are further compounded by inaccurate local localization segmentation and the oversight of small-scale features. To achieve balance between visual space and semantic space, to increase both global and local recognition accuracy, and to enhance the flexibility of input scale features while supplementing global contextual information, in this paper, we propose a U-shaped hierarchical structure called ResU-Former. The incorporation of the Swin Residual Transformer block allows for the efficient segmentation of objects of varying sizes against complex backgrounds, a common scenario in remote sensing datasets. With the specially designed Swin Residual Transformer block as its fundamental unit, ResU-Former accomplishes the full utilization and evolution of information, and the maximum optimization of semantic segmentation in complex remote sensing scenarios. The standard experimental results on benchmark datasets such as Vaihingen, Overall Accuracy of 81.5%, etc., show the ResU-Former's potential to improve segmentation tasks across various remote sensing applications.
引用
收藏
页数:21
相关论文
共 39 条
  • [31] Vaswani A, 2017, ADV NEUR IN, V30
  • [32] Swin Transformer Based Pyramid Pooling Network for Food Segmentation
    Wang, Qiankun
    Dong, Xiaoxiao
    Wang, Ruimin
    Sun, Hao
    [J]. 2022 2ND IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND ARTIFICIAL INTELLIGENCE (SEAI 2022), 2022, : 64 - 68
  • [33] Non-local Neural Networks
    Wang, Xiaolong
    Girshick, Ross
    Gupta, Abhinav
    He, Kaiming
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7794 - 7803
  • [34] Wen LT, 2008, LECT NOTES COMPUT SC, V5263, P610
  • [35] Wider or Deeper: Revisiting the ResNet Model for Visual Recognition
    Wu, Zifeng
    Shen, Chunhua
    van den Hengel, Anton
    [J]. PATTERN RECOGNITION, 2019, 90 : 119 - 133
  • [36] Yu LT, 2022, Arxiv, DOI arXiv:2207.04403
  • [37] Multi-Scale Context Aggregation for Semantic Segmentation of Remote Sensing Images
    Zhang, Jing
    Lin, Shaofu
    Ding, Lei
    Bruzzone, Lorenzo
    [J]. REMOTE SENSING, 2020, 12 (04)
  • [38] Pyramid Scene Parsing Network
    Zhao, Hengshuang
    Shi, Jianping
    Qi, Xiaojuan
    Wang, Xiaogang
    Jia, Jiaya
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6230 - 6239
  • [39] Zhou Zongwei, 2018, Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018), V11045, P3, DOI [10.1007/978-3-030-00889-5_1, 10.1007/978-3-030-00689-1_1]