SARFormer: Segmenting Anything Guided Transformer for semantic segmentation

被引：0

作者：

Zhang, Lixin ^{[1
,2
]}

Huang, Wenteng ^{[1
,2
]}

Fan, Bin ^{[1
,2
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Intelligence Sci & Technol, 30 Xueyuan Rd, Beijing 100083, Peoples R China

[2] Univ Sci & Technol Beijing, Inst Artificial Intelligence, 30 Xueyuan Rd, Beijing 100083, Peoples R China

来源：

NEUROCOMPUTING | 2025年 / 635卷

关键词：

Image segmentation; Machine learning; Deep learning; Neural networks; Transformer; Pre-trained image model;

D O I：

10.1016/j.neucom.2025.129915

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Semantic segmentation plays a crucial role in robotic systems. Despite advances, we find that current state-of-the-art methods are hard to apply in practice due to their weak generalization ability. Especially, diffusion-based segmentation methods struggle with over-reliance on noisy Ground Truth (GT) annotations, which are corrupted with noise and directly fed into the model's forward propagation process during training, limiting the model's ability to generalize. While the Segment Anything Model (SAM) excels at instance segmentation, it faces challenges in controlling granularity and lacks semantic information. To address these issues, we propose SARFormer, a semantic segmentation algorithm guided by SAM. Unlike conventional methods, SARFormer uses GT solely for supervision and replaces noisy GT with SAM guidance, enabling better generalization. The key innovations include a region-based SAM optimizer to refine granularity and a feature aggregation method for enhanced deep feature extraction. Experimental results show SARFormer achieves competitive accuracy, demonstrating the effectiveness of SAM in improving segmentation performance

引用

页数：9

共 60 条

[1] Alexey D, 2020, arXiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[2] Bao Hangbo, 2021, arXiv, DOI DOI 10.48550/ARXIV.2106.08254
[3] COCO-Stuff: Thing and Stuff Classes in Context
Caesar, Holger
Uijlings, Jasper
Ferrari, Vittorio
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1209 - 1218
[4] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Chen, Liang-Chieh
Papandreou, George
Kokkinos, Iasonas
Murphy, Kevin
Yuille, Alan L.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
[5] Chen M, 2020, PR MACH LEARN RES, V119
[6] Chen T., 2023, arXiv, DOI arXiv:2301.10972
[7] Chen Xinlei, 2021, arXiv, DOI arXiv:2104.02057
[8] Chen Y., 2023, P IEEECVF INT C COMP, P5908
[9] Chen Zhe, 2022, Vision transformer adapter for dense predictions, DOI arXiv:2205.08534
[10] Masked-attention Mask Transformer for Universal Image Segmentation
Cheng, Bowen
Misra, Ishan
Schwing, Alexander G.
Kirillov, Alexander
Girdhar, Rohit
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1280 - 1289

← 1 2 3 4 5 6 →