Adversarial domain adaptation with CLIP for few-shot image classification

被引:0
作者
Sun, Tongfeng [1 ,2 ]
Yang, Hongjian [1 ]
Li, Zhongnian [1 ,2 ]
Xu, Xinzheng [1 ,2 ]
Wang, Xiurui [1 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou, Jiangsu, Peoples R China
[2] Minist Educ Peoples Republ China, Mine Digitizat Engn Res Ctr, Xuzhou, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Few-shot learning; Adversarial domain adaptation; Multi-modal features; Knowledge transfer;
D O I
10.1007/s10489-024-06088-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Few-shot learning focuses on training efficient models with limited amounts of training data. Its mainstream approaches have evolved from single-modal to multi-modal methods. The Contrastive Vision-Language Pre-training model, known as CLIP, achieves image classification by aligning the embedding spaces of images and text. To better achieve knowledge transfer between image domain and text domain, we propose a fine-tuning framework for vision-language models with CLIP. It introduces a novel adversarial domain adaptation approach, which trains a text and image symmetrical classifier to identify the differences between two domains. To more effectively align text and image into the same space, we adapt two types of confusion loss to construct the aligned semantic space by fine-tuning multi-modal features extractor. Experiments on 11 public datasets show that our proposed method has superior performance compared with state of art CLIP-driven learning methods.
引用
收藏
页数:12
相关论文
共 51 条
[1]  
Achiam J., 2023, Open AI GPT-4 technical report, DOI [DOI 10.48550/ARXIV.2303.08774, 10.48550/arxiv.2303.08774]
[2]  
Adnan M, 2024, MLSYS
[3]  
[Anonymous], 2012, CORR
[4]   Improved Few-Shot Visual Classification [J].
Bateni, Peyman ;
Goyal, Raghav ;
Masrani, Vaden ;
Wood, Frank ;
Sigal, Leonid .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :14481-14490
[5]  
Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
[6]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[7]   Describing Textures in the Wild [J].
Cimpoi, Mircea ;
Maji, Subhransu ;
Kokkinos, Iasonas ;
Mohamed, Sammy ;
Vedaldi, Andrea .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3606-3613
[8]   Imaging Findings and Clinical Analysis of Primary Intracranial Pure Yolk Sac Tumors in Children and Adolescents: A Retrospective Study from China [J].
Dai, W. ;
Liu, H. ;
Chen, Y. ;
Chen, Z. .
AMERICAN JOURNAL OF NEURORADIOLOGY, 2022, :1054-1059
[9]  
Dathathri S, 2020, ACLR
[10]   Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model [J].
Du, Yu ;
Wei, Fangyun ;
Zhang, Zihe ;
Shi, Miaojing ;
Gao, Yue ;
Li, Guoqi .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :14064-14073