Adversarial domain adaptation with CLIP for few-shot image classification

被引:0
作者
Sun, Tongfeng [1 ,2 ]
Yang, Hongjian [1 ]
Li, Zhongnian [1 ,2 ]
Xu, Xinzheng [1 ,2 ]
Wang, Xiurui [1 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou, Jiangsu, Peoples R China
[2] Minist Educ Peoples Republ China, Mine Digitizat Engn Res Ctr, Xuzhou, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Few-shot learning; Adversarial domain adaptation; Multi-modal features; Knowledge transfer;
D O I
10.1007/s10489-024-06088-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Few-shot learning focuses on training efficient models with limited amounts of training data. Its mainstream approaches have evolved from single-modal to multi-modal methods. The Contrastive Vision-Language Pre-training model, known as CLIP, achieves image classification by aligning the embedding spaces of images and text. To better achieve knowledge transfer between image domain and text domain, we propose a fine-tuning framework for vision-language models with CLIP. It introduces a novel adversarial domain adaptation approach, which trains a text and image symmetrical classifier to identify the differences between two domains. To more effectively align text and image into the same space, we adapt two types of confusion loss to construct the aligned semantic space by fine-tuning multi-modal features extractor. Experiments on 11 public datasets show that our proposed method has superior performance compared with state of art CLIP-driven learning methods.
引用
收藏
页数:12
相关论文
共 51 条
  • [21] ImageNet Classification with Deep Convolutional Neural Networks
    Krizhevsky, Alex
    Sutskever, Ilya
    Hinton, Geoffrey E.
    [J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
  • [22] Li J., 2023, PMLR, P19730, DOI DOI 10.48550/ARXIV.2301.12597
  • [23] Locality Preserving Joint Transfer for Domain Adaptation
    Li, Jingjing
    Jing, Mengmeng
    Lu, Ke
    Zhu, Lei
    Shen, Heng Tao
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (12) : 6103 - 6115
  • [24] Dual Alignment for Partial Domain Adaptation
    Li, Lusi
    Wan, Zhiqiang
    He, Haibo
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (07) : 3404 - 3416
  • [25] Liao B, 2023, ADV NEUR IN
  • [26] Liu H, 2022, NEURIPS
  • [27] Open-World Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding
    Liu, Quande
    Wen, Youpeng
    Han, Jianhua
    Xu, Chunjing
    Xu, Hang
    Liang, Xiaodan
    [J]. COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 : 275 - 292
  • [28] Maji S, 2013, Comput Vis Pattern Recogn
  • [29] Menon S, 2023, 11 INT C LEARN REPR
  • [30] Simple Open-Vocabulary Object Detection
    Minderer, Matthias
    Gritsenko, Alexey
    Stone, Austin
    Neumann, Maxim
    Weissenborn, Dirk
    Dosovitskiy, Alexey
    Mahendran, Aravindh
    Arnab, Anurag
    Dehghani, Mostafa
    Shen, Zhuoran
    Wang, Xiao
    Zhai, Xiaohua
    Kipf, Thomas
    Houlsby, Neil
    [J]. COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 728 - 755