Online Zero-Shot Classification with CLIP

被引:0
|
作者
Qian, Qi [1 ]
Hu, Juhua [2 ]
机构
[1] Alibaba Grp, Bellevue, WA 98004 USA
[2] Univ Washington, Sch Engn & Technol, Tacoma, WA 98402 USA
来源
COMPUTER VISION - ECCV 2024, PT LXXVII | 2024年 / 15135卷
关键词
Online learning; Zero-shot classification; CLIP;
D O I
10.1007/978-3-031-72980-5_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-language pre-training such as CLIP enables zeroshot transfer that can classify images according to the candidate class names. While CLIP demonstrates an impressive zero-shot performance on diverse downstream tasks, the distribution from the target data has not been leveraged sufficiently. In this work, we study a novel online zero-shot transfer scenario, where each image arrives in a random order for classification and is visited only once to obtain prediction immediately without storing its representation. Compared with the vanilla zero-shot classification, the proposed framework preserves its flexibility for online service while considering the statistics of the arrived images as the side information to capture the distribution of target data, which can help improve the performance of real-world applications. To tackle the challenge of effective online optimization, we first develop online label learning to model the target data distribution. Then, the proxy of each class in the vision space is further optimized with the proposed online proxy learning method to mitigate the modality gap between images and text. The convergence of both online strategies can be theoretically guaranteed. By combining the predicted label from the online label learning and proxy learning, our online zero-shot transfer method (OnZeta) achieves 78.94% accuracy on ImageNet without accessing the entire data set. Moreover, extensive experiments on other 13 downstream tasks with different vision encoders show a more than 3% improvement on average, which demonstrates the effectiveness of our proposal.
引用
收藏
页码:462 / 477
页数:16
相关论文
共 50 条
  • [1] Application of CLIP for efficient zero-shot learning
    Yang, Hairui
    Wang, Ning
    Li, Haojie
    Wang, Lei
    Wang, Zhihui
    MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [2] CLIPMulti: Explore the performance of multimodal enhanced CLIP for zero-shot text classification
    Wang, Peng
    Li, Dagang
    Hu, Xuesi
    Wang, Yongmei
    Zhang, Youhua
    COMPUTER SPEECH AND LANGUAGE, 2025, 90
  • [3] Zero-Shot Learning in Maritime Domain: Classification of Marine Objects using CLIP
    Lorencin, Ivan
    Frank, Domagoj
    Vusic, Damir
    POMORSTVO-SCIENTIFIC JOURNAL OF MARITIME RESEARCH, 2024, 38 (02) : 239 - 249
  • [4] Improving Zero-Shot Generalization for CLIP with Variational Adapter
    Lu, Ziqian
    Shen, Fengli
    Liu, Mushui
    Yu, Yunlong
    Li, Xi
    COMPUTER VISION - ECCV 2024, PT XX, 2025, 15078 : 328 - 344
  • [5] Zero-Shot Turkish Text Classification
    Birim, Ahmet
    Erden, Mustafa
    Arslan, Levent M.
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [6] Latent Embeddings for Zero-shot Classification
    Xian, Yongqin
    Akata, Zeynep
    Sharma, Gaurav
    Nguyen, Quynh
    Hein, Matthias
    Schiele, Bernt
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 69 - 77
  • [7] ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation
    Zhou, Ziqin
    Lei, Yinjie
    Zhano, Bowen
    Liu, Lingqiao
    Liu, Yifan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11175 - 11185
  • [8] ClipSAM: CLIP and SAM collaboration for zero-shot anomaly segmentation
    Li, Shengze
    Cao, Jianjian
    Ye, Peng
    Ding, Yuhan
    Tu, Chongjun
    Chen, Tao
    NEUROCOMPUTING, 2025, 618
  • [9] CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No
    Wang, Hualiang
    Li, Yi
    Yao, Huifeng
    Li, Xiaomeng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1802 - 1812
  • [10] Generating Visual Representations for Zero-Shot Classification
    Bucher, Maxime
    Herbin, Stephane
    Jurie, Frederic
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 2666 - 2673