Adversarial domain adaptation with CLIP for few-shot image classification

被引:0
作者
Sun, Tongfeng [1 ,2 ]
Yang, Hongjian [1 ]
Li, Zhongnian [1 ,2 ]
Xu, Xinzheng [1 ,2 ]
Wang, Xiurui [1 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou, Jiangsu, Peoples R China
[2] Minist Educ Peoples Republ China, Mine Digitizat Engn Res Ctr, Xuzhou, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Few-shot learning; Adversarial domain adaptation; Multi-modal features; Knowledge transfer;
D O I
10.1007/s10489-024-06088-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Few-shot learning focuses on training efficient models with limited amounts of training data. Its mainstream approaches have evolved from single-modal to multi-modal methods. The Contrastive Vision-Language Pre-training model, known as CLIP, achieves image classification by aligning the embedding spaces of images and text. To better achieve knowledge transfer between image domain and text domain, we propose a fine-tuning framework for vision-language models with CLIP. It introduces a novel adversarial domain adaptation approach, which trains a text and image symmetrical classifier to identify the differences between two domains. To more effectively align text and image into the same space, we adapt two types of confusion loss to construct the aligned semantic space by fine-tuning multi-modal features extractor. Experiments on 11 public datasets show that our proposed method has superior performance compared with state of art CLIP-driven learning methods.
引用
收藏
页数:12
相关论文
共 51 条
[21]  
Kingma DP., 2014, P 2 INT C LEARN REPR
[22]   3D Object Representations for Fine-Grained Categorization [J].
Krause, Jonathan ;
Stark, Michael ;
Deng, Jia ;
Li Fei-Fei .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, :554-561
[23]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[24]   Locality Preserving Joint Transfer for Domain Adaptation [J].
Li, Jingjing ;
Jing, Mengmeng ;
Lu, Ke ;
Zhu, Lei ;
Shen, Heng Tao .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (12) :6103-6115
[25]  
Li JN, P MACHINE LEARNING R
[26]   Dual Alignment for Partial Domain Adaptation [J].
Li, Lusi ;
Wan, Zhiqiang ;
He, Haibo .
IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (07) :3404-3416
[27]  
Liao BH, 2023, ADV NEUR IN
[28]   Open-World Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding [J].
Liu, Quande ;
Wen, Youpeng ;
Han, Jianhua ;
Xu, Chunjing ;
Xu, Hang ;
Liang, Xiaodan .
COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 :275-292
[29]  
Maji S, 2013, Comput Vis Pattern Recogn
[30]  
Menon S., 2023, 11 INT C LEARN REPR