SARFormer: Segmenting Anything Guided Transformer for semantic segmentation

被引：0

作者：

Zhang, Lixin ^{[1
,2
]}

Huang, Wenteng ^{[1
,2
]}

Fan, Bin ^{[1
,2
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Intelligence Sci & Technol, 30 Xueyuan Rd, Beijing 100083, Peoples R China

[2] Univ Sci & Technol Beijing, Inst Artificial Intelligence, 30 Xueyuan Rd, Beijing 100083, Peoples R China

来源：

NEUROCOMPUTING | 2025年 / 635卷

关键词：

Image segmentation; Machine learning; Deep learning; Neural networks; Transformer; Pre-trained image model;

D O I：

10.1016/j.neucom.2025.129915

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Semantic segmentation plays a crucial role in robotic systems. Despite advances, we find that current state-of-the-art methods are hard to apply in practice due to their weak generalization ability. Especially, diffusion-based segmentation methods struggle with over-reliance on noisy Ground Truth (GT) annotations, which are corrupted with noise and directly fed into the model's forward propagation process during training, limiting the model's ability to generalize. While the Segment Anything Model (SAM) excels at instance segmentation, it faces challenges in controlling granularity and lacks semantic information. To address these issues, we propose SARFormer, a semantic segmentation algorithm guided by SAM. Unlike conventional methods, SARFormer uses GT solely for supervision and replaces noisy GT with SAM guidance, enabling better generalization. The key innovations include a region-based SAM optimizer to refine granularity and a feature aggregation method for enhanced deep feature extraction. Experimental results show SARFormer achieves competitive accuracy, demonstrating the effectiveness of SAM in improving segmentation performance

引用

页数：9

共 60 条

[11] Cheng Y, 2023, arXiv, DOI [arXiv:2305.06558, DOI 10.48550/ARXIV.2305.06558]
[12] Dual Attention Network for Scene Segmentation
Fu, Jun
Liu, Jing
Tian, Haijie
Li, Yong
Bao, Yongjun
Fang, Zhiwei
Lu, Hanqing
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3141 - 3149
[13] DIFFUSIONINST: DIFFUSION MODEL FOR INSTANCE SEGMENTATION
Gu, Zhangxuan
Chen, Haoxing
Xu, Zhuoer
[J]. 2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 2730 - 2734
[14] Ha S., 2024, IEEE Robot. Autom. Lett.
[15] Masked Autoencoders Are Scalable Vision Learners
He, Kaiming
Chen, Xinlei
Xie, Saining
Li, Yanghao
Dollar, Piotr
Girshick, Ross
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15979 - 15988
[16] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[17] Huang Y, 2022, AAAI CONF ARTIF INTE, P1016
[18] SeMask: Semantically Masked Transformers for Semantic Segmentation
Jain, Jitesh
Singh, Anukriti
Orlov, Nikita
Huang, Zilong
Li, Jiachen
Walton, Steven
Shi, Humphrey
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 752 - 761
[19] Jin Y., 2024, IEEE Robot. Autom. Lett.
[20] Mining Contextual Information Beyond Image for Semantic Segmentation
Jin, Zhenchao
Gong, Tao
Yu, Dongdong
Chu, Qi
Wang, Jian
Wang, Changhu
Shao, Jie
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7211 - 7221

← 1 2 3 4 5 6 →