Adversarial domain adaptation with CLIP for few-shot image classification

被引:0
作者
Sun, Tongfeng [1 ,2 ]
Yang, Hongjian [1 ]
Li, Zhongnian [1 ,2 ]
Xu, Xinzheng [1 ,2 ]
Wang, Xiurui [1 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou, Jiangsu, Peoples R China
[2] Minist Educ Peoples Republ China, Mine Digitizat Engn Res Ctr, Xuzhou, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Few-shot learning; Adversarial domain adaptation; Multi-modal features; Knowledge transfer;
D O I
10.1007/s10489-024-06088-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Few-shot learning focuses on training efficient models with limited amounts of training data. Its mainstream approaches have evolved from single-modal to multi-modal methods. The Contrastive Vision-Language Pre-training model, known as CLIP, achieves image classification by aligning the embedding spaces of images and text. To better achieve knowledge transfer between image domain and text domain, we propose a fine-tuning framework for vision-language models with CLIP. It introduces a novel adversarial domain adaptation approach, which trains a text and image symmetrical classifier to identify the differences between two domains. To more effectively align text and image into the same space, we adapt two types of confusion loss to construct the aligned semantic space by fine-tuning multi-modal features extractor. Experiments on 11 public datasets show that our proposed method has superior performance compared with state of art CLIP-driven learning methods.
引用
收藏
页数:12
相关论文
共 51 条
  • [1] Achiam J., 2023, ARXIV, DOI DOI 10.48550/ARXIV.2303.08774
  • [2] Adnan M, 2024, MLSYS
  • [3] Bateni P, 2020, PROC CVPR IEEE, P14481, DOI 10.1109/CVPR42600.2020.01450
  • [4] Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
  • [5] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [6] Describing Textures in the Wild
    Cimpoi, Mircea
    Maji, Subhransu
    Kokkinos, Iasonas
    Mohamed, Sammy
    Vedaldi, Andrea
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3606 - 3613
  • [7] Dathathri S, 2020, ACLR
  • [8] Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model
    Du, Yu
    Wei, Fangyun
    Zhang, Zihe
    Shi, Miaojing
    Gao, Yue
    Li, Guoqi
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14064 - 14073
  • [9] Fei-Fei Li, 2004, 2004 C COMPUTER VISI, P178, DOI [DOI 10.1016/J.CVIU.2005.09.012, DOI 10.1109/CVPR.2004.383]
  • [10] Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends
    Gan, Zhe
    Li, Linjie
    Li, Chunyuan
    Wang, Lijuan
    Liu, Zicheng
    Gao, Jianfeng
    [J]. FOUNDATIONS AND TRENDS IN COMPUTER GRAPHICS AND VISION, 2022, 14 (3-4): : 163 - 352