SuS-X: Training-Free Name-Only Transfer of Vision-Language Models

被引：10

作者：

Udandarao, Vishaal ^{[1
]}

Gupta, Ankush ^{[2
]}

Albanie, Samuel ^{[1
]}

机构：

[1] Univ Cambridge, Cambridge, England

[2] DeepMind, London, England

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年

关键词：

D O I：

10.1109/ICCV51070.2023.00257

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way to train large-scale vision-language models. CLIP demonstrates impressive zero-shot classification and retrieval performance on diverse downstream tasks. However, to leverage its full potential, fine-tuning still appears to be necessary. Fine-tuning the entire CLIP model can be resource-intensive and unstable. Moreover, recent methods that aim to circumvent this need for fine-tuning still require access to images from the target task distribution. In this paper, we pursue a different approach and explore the regime of training-free "name-only transfer" in which the only knowledge we possess about the downstream task comprises the names of downstream target categories. We propose a novel method, SuS-X, consisting of two key building blocks-"SuS" and "TIP-X", that requires neither intensive fine-tuning nor costly labelled data. SuS-X achieves state-of-the-art (SoTA) zero-shot classification results on 19 benchmark datasets. We further show the utility of TIP-X in the training-free few-shot setting, where we again achieve SoTA results over strong training-free baselines. Code is available at https://github.com/vishaal27/SuS-X.

引用

页码：2725 / 2736

页数：12

共 28 条

[1] Towards Adversarial Attack on Vision-Language Pre-training Models
Zhang, Jiaming
Yi, Qi
Sang, Jitao
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5005 - 5013
[2] Transferable Multimodal Attack on Vision-Language Pre-training Models
Wang, Haodi
Dong, Kai
Zhu, Zhilei
Qin, Haotong
Liu, Aishan
Fang, Xiaolin
Wang, Jiakai
Liu, Xianglong
45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 1722 - 1740
[3] LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models
Javaheripi, Mojan
de Rosa, Gustavo H.
Mukherjee, Subhabrata
Shah, Shital
Religa, Tomasz L.
Mendes, Caio C. T.
Bubeck, Sebastien
Koushanfar, Farinaz
Dey, Debadeepta
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[4] LiFT: Transfer Learning in Vision-Language Models for Downstream Adaptation and Generalization
Li, Jingzheng
Sun, Hailong
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4678 - 4687
[5] HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Ning, Shan
Qiu, Longtian
Liu, Yongfei
He, Xuming
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23507 - 23517
[6] Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-training
Zhang, Wenyu
Shen, Li
Foo, Chuan-Sheng
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 844 - 866
[7] Contrastive Region Guidance: Improving Grounding in Vision-Language Models Without Training
Wan, David
Cho, Jaemin
Stengel-Eskin, Elias
Bansal, Mohit
COMPUTER VISION - ECCV 2024, PT LXXIX, 2025, 15137 : 198 - 215
[8] Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Huang, Po-Yao
Patrick, Mandela
Hu, Junjie
Neubig, Graham
Metze, Florian
Hauptmann, Alexander
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2443 - 2459
[9] GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods
Yin, Da
Gao, Feng
Thattai, Govind
Johnston, Michael
Chang, Kai -Wei
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10951 - 10961
[10] Multimodal alignment augmentation transferable attack on vision-language pre-training models
Fu, Tingchao
Zhang, Jinhong
Li, Fanxiao
Wei, Ping
Zeng, Xianglong
Zhou, Wei
PATTERN RECOGNITION LETTERS, 2025, 191 : 131 - 137

← 1 2 3 →