SuS-X: Training-Free Name-Only Transfer of Vision-Language Models

被引:10
|
作者
Udandarao, Vishaal [1 ]
Gupta, Ankush [2 ]
Albanie, Samuel [1 ]
机构
[1] Univ Cambridge, Cambridge, England
[2] DeepMind, London, England
关键词
D O I
10.1109/ICCV51070.2023.00257
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way to train large-scale vision-language models. CLIP demonstrates impressive zero-shot classification and retrieval performance on diverse downstream tasks. However, to leverage its full potential, fine-tuning still appears to be necessary. Fine-tuning the entire CLIP model can be resource-intensive and unstable. Moreover, recent methods that aim to circumvent this need for fine-tuning still require access to images from the target task distribution. In this paper, we pursue a different approach and explore the regime of training-free "name-only transfer" in which the only knowledge we possess about the downstream task comprises the names of downstream target categories. We propose a novel method, SuS-X, consisting of two key building blocks-"SuS" and "TIP-X", that requires neither intensive fine-tuning nor costly labelled data. SuS-X achieves state-of-the-art (SoTA) zero-shot classification results on 19 benchmark datasets. We further show the utility of TIP-X in the training-free few-shot setting, where we again achieve SoTA results over strong training-free baselines. Code is available at https://github.com/vishaal27/SuS-X.
引用
收藏
页码:2725 / 2736
页数:12
相关论文
共 29 条
  • [21] Improving Medical Speech-to-Text Accuracy using Vision-Language Pre-training Models
    Huh, Jaeyoung
    Park, Sangjoon
    Lee, Jeong Eun
    Ye, Jong Chul
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (03) : 1692 - 1703
  • [22] Distilling vision-language pre-training models with modality-specific meta-learning
    Ma, Xinge
    Wang, Jin
    Zhang, Xuejie
    KNOWLEDGE-BASED SYSTEMS, 2025, 315
  • [23] Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
    Yu, Yu-Chu
    Huang, Chi-Pin
    Chen, Jr-Jen
    Chang, Kai-Po
    Lai, Yung-Hsuan
    Yang, Fu-En
    Wang, Yu-Chiang Frank
    COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 219 - 236
  • [24] M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization
    Liu, Che
    Cheng, Sibo
    Chen, Chen
    Qiao, Mengyun
    Zhang, Weitong
    Shah, Anand
    Bai, Wenjia
    Arcucci, Rossella
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT I, 2023, 14220 : 637 - 647
  • [25] Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models
    Lu, Dong
    Wang, Zhiqiang
    Wang, Teng
    Guan, Weili
    Gao, Hongchang
    Zheng, Feng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 102 - 111
  • [26] THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
    Kaul, Prannay
    Li, Zhizhong
    Yang, Hao
    Dukler, Yonatan
    Swaminathan, Ashwin
    Taylor, C. J.
    Soatto, Stefano
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27218 - +
  • [27] FreeZe: Training-Free Zero-Shot 6D Pose Estimation with Geometric and Vision Foundation Models
    Caraffa, Andrea
    Boscaini, Davide
    Hamza, Amir
    Poiesi, Fabio
    COMPUTER VISION - ECCV 2024, PT LXXV, 2025, 15133 : 414 - 431
  • [28] Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
    Chung, Jiwoo
    Hyun, Sangeek
    Heo, Jae-Pil
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8795 - 8805
  • [29] Knowledge-Grounded Adaptation Strategy for Vision-Language Models: Building a Unique Case-Set for Screening Mammograms for Residents Training
    Khan, Aisha Urooj
    Garrett, John
    Bradshaw, Tyler
    Salkowski, Lonie
    Jeong, Jiwoong
    Tariq, Amara
    Banerjee, Imon
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XII, 2024, 15012 : 587 - 598