ResU-Former: Advancing Remote Sensing Image Segmentation with Swin Residual Transformer for Precise Global-Local Feature Recognition and Visual-Semantic Space Learning

被引：3

作者：

Li, Hanlu ^{[1
]}

Li, Lei ^{[2
]}

Zhao, Liangyu ^{[1
]}

Liu, Fuxiang ^{[1
]}

机构：

[1] Beijing Inst Technol, Minist Educ, Key Lab Dynam & Control Flight Vehicle, Beijing 100081, Peoples R China

[2] Aerosp Tianmu Chongqing Satellite Sci & Technol Co, Chongqing 400000, Peoples R China

来源：

ELECTRONICS | 2024年 / 13卷 / 02期

关键词：

semantic segmentation; transformer; balance between visual and semantic space; enhancement of both global and local aspects;

D O I：

10.3390/electronics13020436

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the field of remote sensing image segmentation, achieving high accuracy and efficiency in diverse and complex environments remains a challenge. Additionally, there is a notable imbalance between the underlying features and the high-level semantic information embedded within remote sensing images, and both global and local recognition improvements are also limited by the multi-scale remote sensing scenery and imbalanced class distribution. These challenges are further compounded by inaccurate local localization segmentation and the oversight of small-scale features. To achieve balance between visual space and semantic space, to increase both global and local recognition accuracy, and to enhance the flexibility of input scale features while supplementing global contextual information, in this paper, we propose a U-shaped hierarchical structure called ResU-Former. The incorporation of the Swin Residual Transformer block allows for the efficient segmentation of objects of varying sizes against complex backgrounds, a common scenario in remote sensing datasets. With the specially designed Swin Residual Transformer block as its fundamental unit, ResU-Former accomplishes the full utilization and evolution of information, and the maximum optimization of semantic segmentation in complex remote sensing scenarios. The standard experimental results on benchmark datasets such as Vaihingen, Overall Accuracy of 81.5%, etc., show the ResU-Former's potential to improve segmentation tasks across various remote sensing applications.

引用

页数：21

共 39 条

[31] Vaswani A, 2017, ADV NEUR IN, V30
[32] Swin Transformer Based Pyramid Pooling Network for Food Segmentation
Wang, Qiankun
Dong, Xiaoxiao
Wang, Ruimin
Sun, Hao
[J]. 2022 2ND IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND ARTIFICIAL INTELLIGENCE (SEAI 2022), 2022, : 64 - 68
[33] Non-local Neural Networks
Wang, Xiaolong
Girshick, Ross
Gupta, Abhinav
He, Kaiming
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7794 - 7803
[34] Wen LT, 2008, LECT NOTES COMPUT SC, V5263, P610
[35] Wider or Deeper: Revisiting the ResNet Model for Visual Recognition
Wu, Zifeng
Shen, Chunhua
van den Hengel, Anton
[J]. PATTERN RECOGNITION, 2019, 90 : 119 - 133
[36] Yu LT, 2022, Arxiv, DOI arXiv:2207.04403
[37] Multi-Scale Context Aggregation for Semantic Segmentation of Remote Sensing Images
Zhang, Jing
Lin, Shaofu
Ding, Lei
Bruzzone, Lorenzo
[J]. REMOTE SENSING, 2020, 12 (04)
[38] Pyramid Scene Parsing Network
Zhao, Hengshuang
Shi, Jianping
Qi, Xiaojuan
Wang, Xiaogang
Jia, Jiaya
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6230 - 6239
[39] Zhou Zongwei, 2018, Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018), V11045, P3, DOI [10.1007/978-3-030-00889-5_1, 10.1007/978-3-030-00689-1_1]

← 1 2 3 4 →