SARFormer: Segmenting Anything Guided Transformer for semantic segmentation

被引:0
作者
Zhang, Lixin [1 ,2 ]
Huang, Wenteng [1 ,2 ]
Fan, Bin [1 ,2 ]
机构
[1] Univ Sci & Technol Beijing, Sch Intelligence Sci & Technol, 30 Xueyuan Rd, Beijing 100083, Peoples R China
[2] Univ Sci & Technol Beijing, Inst Artificial Intelligence, 30 Xueyuan Rd, Beijing 100083, Peoples R China
关键词
Image segmentation; Machine learning; Deep learning; Neural networks; Transformer; Pre-trained image model;
D O I
10.1016/j.neucom.2025.129915
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic segmentation plays a crucial role in robotic systems. Despite advances, we find that current state-of-the-art methods are hard to apply in practice due to their weak generalization ability. Especially, diffusion-based segmentation methods struggle with over-reliance on noisy Ground Truth (GT) annotations, which are corrupted with noise and directly fed into the model's forward propagation process during training, limiting the model's ability to generalize. While the Segment Anything Model (SAM) excels at instance segmentation, it faces challenges in controlling granularity and lacks semantic information. To address these issues, we propose SARFormer, a semantic segmentation algorithm guided by SAM. Unlike conventional methods, SARFormer uses GT solely for supervision and replaces noisy GT with SAM guidance, enabling better generalization. The key innovations include a region-based SAM optimizer to refine granularity and a feature aggregation method for enhanced deep feature extraction. Experimental results show SARFormer achieves competitive accuracy, demonstrating the effectiveness of SAM in improving segmentation performance
引用
收藏
页数:9
相关论文
共 60 条
  • [11] Cheng Y, 2023, arXiv, DOI [arXiv:2305.06558, DOI 10.48550/ARXIV.2305.06558]
  • [12] Dual Attention Network for Scene Segmentation
    Fu, Jun
    Liu, Jing
    Tian, Haijie
    Li, Yong
    Bao, Yongjun
    Fang, Zhiwei
    Lu, Hanqing
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3141 - 3149
  • [13] DIFFUSIONINST: DIFFUSION MODEL FOR INSTANCE SEGMENTATION
    Gu, Zhangxuan
    Chen, Haoxing
    Xu, Zhuoer
    [J]. 2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 2730 - 2734
  • [14] Ha S., 2024, IEEE Robot. Autom. Lett.
  • [15] Masked Autoencoders Are Scalable Vision Learners
    He, Kaiming
    Chen, Xinlei
    Xie, Saining
    Li, Yanghao
    Dollar, Piotr
    Girshick, Ross
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15979 - 15988
  • [16] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [17] Huang Y, 2022, AAAI CONF ARTIF INTE, P1016
  • [18] SeMask: Semantically Masked Transformers for Semantic Segmentation
    Jain, Jitesh
    Singh, Anukriti
    Orlov, Nikita
    Huang, Zilong
    Li, Jiachen
    Walton, Steven
    Shi, Humphrey
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 752 - 761
  • [19] Jin Y., 2024, IEEE Robot. Autom. Lett.
  • [20] Mining Contextual Information Beyond Image for Semantic Segmentation
    Jin, Zhenchao
    Gong, Tao
    Yu, Dongdong
    Chu, Qi
    Wang, Jian
    Wang, Changhu
    Shao, Jie
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7211 - 7221