Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

被引:0
|
作者
Shao, Tong [1 ]
Tian, Zhuotao [1 ]
Zhao, Hang [1 ]
Su, Jingyong [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
来源
COMPUTER VISION - ECCV 2024, PT LXXXVI | 2025年 / 15144卷
基金
中国国家自然科学基金;
关键词
CLIP; Training-free; Semantic Segmentation;
D O I
10.1007/978-3-031-73016-0_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training, which affects its performance in tasks requiring detailed local context. Our study delves into the impact of CLIP's [CLS] token on patch feature correlations, revealing a dominance of "global" patches that hinders local feature discrimination. To overcome this, we propose CLIPtrase, a novel training-free semantic segmentation strategy that enhances local feature awareness through recalibrated self-correlation among patches. This approach demonstrates notable improvements in segmentation accuracy and the ability to maintain semantic coherence across objects. Experiments show that we are 22.3% ahead of CLIP on average on 9 segmentation benchmarks, outperforming existing state-of-the-art training-free methods. The code are made publicly available at https://github.com/leaves162/CLIPtrase.
引用
收藏
页码:139 / 156
页数:18
相关论文
共 18 条
  • [1] Diffusion Model is Secretly a Training-Free Open Vocabulary Semantic Segmenter
    Wang, Jinglong
    Li, Xiawei
    Zhang, Jing
    Xu, Qingyuan
    Zhou, Qin
    Yu, Qian
    Sheng, Lu
    Xu, Dong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 1895 - 1907
  • [2] TAG: Guidance-Free Open-Vocabulary Semantic Segmentation
    Kawano, Yasufumi
    Aoki, Yoshimitsu
    IEEE ACCESS, 2024, 12 : 88322 - 88331
  • [3] In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
    Kang, Dahyun
    Cho, Minsu
    COMPUTER VISION - ECCV 2024, PT XLI, 2025, 15099 : 143 - 164
  • [4] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
    Lan, Mengcheng
    Chen, Chaofeng
    Ke, Yiping
    Wang, Xinjiang
    Feng, Litong
    Zhang, Wayne
    COMPUTER VISION - ECCV 2024, PT XXXVII, 2025, 15095 : 70 - 88
  • [5] Enhancing Open-Vocabulary Semantic Segmentation with Prototype Retrieval
    Barsellotti, Luca
    Amoroso, Roberto
    Baraldi, Lorenzo
    Cucchiara, Rita
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 196 - 208
  • [6] Exploration of an Open Vocabulary Model on Semantic Segmentation for Street Scene Imagery
    Zeng, Zichao
    Boehm, Jan
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (05)
  • [7] Image-text aggregation for open-vocabulary semantic segmentation
    Cheng, Shengyang
    Huang, Jianyong
    Wang, Xiaodong
    Huang, Lei
    Wei, Zhiqiang
    NEUROCOMPUTING, 2025, 630
  • [8] SAN: Side Adapter Network for Open-Vocabulary Semantic Segmentation
    Xu, Mengde
    Zhang, Zheng
    Wei, Fangyun
    Hu, Han
    Bai, Xiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15546 - 15561
  • [9] LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation
    Shi, Hengcan
    Dao, Son Duy
    Cai, Jianfei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 742 - 759
  • [10] Class Enhancement Losses With Pseudo Labels for Open-Vocabulary Semantic Segmentation
    Dao, Son Duy
    Shi, Hengcan
    Phung, Dinh
    Cai, Jianfei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8442 - 8453