SARFormer: Segmenting Anything Guided Transformer for semantic segmentation

被引:0
作者
Zhang, Lixin [1 ,2 ]
Huang, Wenteng [1 ,2 ]
Fan, Bin [1 ,2 ]
机构
[1] Univ Sci & Technol Beijing, Sch Intelligence Sci & Technol, 30 Xueyuan Rd, Beijing 100083, Peoples R China
[2] Univ Sci & Technol Beijing, Inst Artificial Intelligence, 30 Xueyuan Rd, Beijing 100083, Peoples R China
关键词
Image segmentation; Machine learning; Deep learning; Neural networks; Transformer; Pre-trained image model;
D O I
10.1016/j.neucom.2025.129915
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic segmentation plays a crucial role in robotic systems. Despite advances, we find that current state-of-the-art methods are hard to apply in practice due to their weak generalization ability. Especially, diffusion-based segmentation methods struggle with over-reliance on noisy Ground Truth (GT) annotations, which are corrupted with noise and directly fed into the model's forward propagation process during training, limiting the model's ability to generalize. While the Segment Anything Model (SAM) excels at instance segmentation, it faces challenges in controlling granularity and lacks semantic information. To address these issues, we propose SARFormer, a semantic segmentation algorithm guided by SAM. Unlike conventional methods, SARFormer uses GT solely for supervision and replaces noisy GT with SAM guidance, enabling better generalization. The key innovations include a region-based SAM optimizer to refine granularity and a feature aggregation method for enhanced deep feature extraction. Experimental results show SARFormer achieves competitive accuracy, demonstrating the effectiveness of SAM in improving segmentation performance
引用
收藏
页数:9
相关论文
共 60 条
  • [1] Alexey D, 2020, arXiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
  • [2] Bao Hangbo, 2021, arXiv, DOI DOI 10.48550/ARXIV.2106.08254
  • [3] COCO-Stuff: Thing and Stuff Classes in Context
    Caesar, Holger
    Uijlings, Jasper
    Ferrari, Vittorio
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1209 - 1218
  • [4] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [5] Chen M, 2020, PR MACH LEARN RES, V119
  • [6] Chen T., 2023, arXiv, DOI arXiv:2301.10972
  • [7] Chen Xinlei, 2021, arXiv, DOI arXiv:2104.02057
  • [8] Chen Y., 2023, P IEEECVF INT C COMP, P5908
  • [9] Chen Zhe, 2022, Vision transformer adapter for dense predictions, DOI arXiv:2205.08534
  • [10] Masked-attention Mask Transformer for Universal Image Segmentation
    Cheng, Bowen
    Misra, Ishan
    Schwing, Alexander G.
    Kirillov, Alexander
    Girdhar, Rohit
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1280 - 1289