Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

被引：0

作者：

Shao, Tong ^{[1
]}

Tian, Zhuotao ^{[1
]}

Zhao, Hang ^{[1
]}

Su, Jingyong ^{[1
]}

机构：

[1] Harbin Inst Technol, Shenzhen, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT LXXXVI | 2025年 / 15144卷

基金：

中国国家自然科学基金;

关键词：

CLIP; Training-free; Semantic Segmentation;

D O I：

10.1007/978-3-031-73016-0_9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training, which affects its performance in tasks requiring detailed local context. Our study delves into the impact of CLIP's [CLS] token on patch feature correlations, revealing a dominance of "global" patches that hinders local feature discrimination. To overcome this, we propose CLIPtrase, a novel training-free semantic segmentation strategy that enhances local feature awareness through recalibrated self-correlation among patches. This approach demonstrates notable improvements in segmentation accuracy and the ability to maintain semantic coherence across objects. Experiments show that we are 22.3% ahead of CLIP on average on 9 segmentation benchmarks, outperforming existing state-of-the-art training-free methods. The code are made publicly available at https://github.com/leaves162/CLIPtrase.

引用

页码：139 / 156

页数：18

共 18 条

[1] Diffusion Model is Secretly a Training-Free Open Vocabulary Semantic Segmenter
Wang, Jinglong
Li, Xiawei
Zhang, Jing
Xu, Qingyuan
Zhou, Qin
Yu, Qian
Sheng, Lu
Xu, Dong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 1895 - 1907
[2] TAG: Guidance-Free Open-Vocabulary Semantic Segmentation
Kawano, Yasufumi
Aoki, Yoshimitsu
IEEE ACCESS, 2024, 12 : 88322 - 88331
[3] In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
Kang, Dahyun
Cho, Minsu
COMPUTER VISION - ECCV 2024, PT XLI, 2025, 15099 : 143 - 164
[4] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
Lan, Mengcheng
Chen, Chaofeng
Ke, Yiping
Wang, Xinjiang
Feng, Litong
Zhang, Wayne
COMPUTER VISION - ECCV 2024, PT XXXVII, 2025, 15095 : 70 - 88
[5] Enhancing Open-Vocabulary Semantic Segmentation with Prototype Retrieval
Barsellotti, Luca
Amoroso, Roberto
Baraldi, Lorenzo
Cucchiara, Rita
IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 196 - 208
[6] Exploration of an Open Vocabulary Model on Semantic Segmentation for Street Scene Imagery
Zeng, Zichao
Boehm, Jan
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (05)
[7] Image-text aggregation for open-vocabulary semantic segmentation
Cheng, Shengyang
Huang, Jianyong
Wang, Xiaodong
Huang, Lei
Wei, Zhiqiang
NEUROCOMPUTING, 2025, 630
[8] SAN: Side Adapter Network for Open-Vocabulary Semantic Segmentation
Xu, Mengde
Zhang, Zheng
Wei, Fangyun
Hu, Han
Bai, Xiang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15546 - 15561
[9] LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation
Shi, Hengcan
Dao, Son Duy
Cai, Jianfei
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 742 - 759
[10] Class Enhancement Losses With Pseudo Labels for Open-Vocabulary Semantic Segmentation
Dao, Son Duy
Shi, Hengcan
Phung, Dinh
Cai, Jianfei
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8442 - 8453

← 1 2 →