Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models

被引:0
|
作者
Yu, Yu-Chu [1 ]
Huang, Chi-Pin [1 ]
Chen, Jr-Jen [1 ]
Chang, Kai-Po [1 ]
Lai, Yung-Hsuan [1 ]
Yang, Fu-En [2 ]
Wang, Yu-Chiang Frank [1 ,2 ]
机构
[1] Natl Taiwan Univ, Taipei, Taiwan
[2] NVIDIA, Santa Clara, CA USA
来源
关键词
Continual Learning; Vision-Language Models; Knowledge Distillation;
D O I
10.1007/978-3-031-73347-5_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale vision-language models (VLMs) have shown a strong zero-shot generalization capability on unseen-domain data. However, adapting pre-trained VLMs to a sequence of downstream tasks often leads to the forgetting of previously learned knowledge and a reduction in zero-shot classification performance. To tackle this problem, we propose a unique Selective Dual-Teacher Knowledge Transfer framework that leverages the most recent fine-tuned and the original pre-trained VLMs as dual teachers to preserve the previously learned knowledge and zero-shot capabilities, respectively. With only access to an unlabeled reference dataset, our proposed framework performs a selective knowledge distillation mechanism by measuring the feature discrepancy from the dual-teacher VLMs. Consequently, our selective dual-teacher knowledge distillation mitigates catastrophic forgetting of previously learned knowledge while preserving the zero-shot capabilities of pre-trained VLMs. Extensive experiments on benchmark datasets demonstrate that our framework is favorable against state-of-the-art continual learning approaches for preventing catastrophic forgetting and zero-shot degradation. Project page: https://chuyu.org/research/snd.
引用
收藏
页码:219 / 236
页数:18
相关论文
共 50 条
  • [41] Modal Interaction-Enhanced Prompt Learning by Transformer Decoder for Vision-Language Models
    Liu, Mingyue
    Zhao, Honggang
    Ma, Longfei
    Li, Xiang
    Ji, Yucheng
    Li, Mingyong
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2023, 2023, 14120 : 163 - 174
  • [42] Fine-Grained Visual Prompt Learning of Vision-Language Models for Image Recognition
    Sun, Hongbo
    He, Xiangteng
    Zhou, Jiahuan
    Peng, Yuxin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5828 - 5836
  • [43] Modal interaction-enhanced prompt learning by transformer decoder for vision-language models
    Mingyue Liu
    Honggang Zhao
    Longfei Ma
    Mingyong Li
    International Journal of Multimedia Information Retrieval, 2023, 12
  • [44] Fine-grained multi-modal prompt learning for vision-language models
    Liu, Yunfei
    Deng, Yunziwei
    Liu, Anqi
    Liu, Yanan
    Li, Shengyang
    NEUROCOMPUTING, 2025, 636
  • [45] Modal interaction-enhanced prompt learning by transformer decoder for vision-language models
    Liu, Mingyue
    Zhao, Honggang
    Ma, Longfei
    Li, Mingyong
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (02)
  • [46] Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
    Li, Juncheng
    Gao, Minghe
    Wei, Longhui
    Tang, Siliang
    Zhang, Wenqiao
    Li, Mengze
    Ji, Wei
    Tian, Qi
    Chua, Tat-Seng
    Zhuang, Yueting
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2551 - 2562
  • [47] Task-Oriented Multi-Modal Mutual Learning for Vision-Language Models
    Long, Sifan
    Zhao, Zhen
    Yuan, Junkun
    Tan, Zichang
    Liu, Jiangjiang
    Zhou, Luping
    Wang, Shengsheng
    Wang, Jingdong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21902 - 21912
  • [48] LifeGraph 4-Lifelog Retrieval using Multimodal Knowledge Graphs and Vision-Language Models
    Rossetto, Luca
    Kyriakou, Athina
    Lange, Svenja
    Ruosch, Florian
    Wang, Ruijie
    Wardatzky, Kathrin
    Bernstein, Abraham
    PROCEEDINGS OF 2024 ACM WORKSHOP ON THE LIFELOG SEARCH CHALLENGE, LSC 2024, 2024, : 88 - 92
  • [49] Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models
    Zheng, Kecheng
    Wu, Wei
    Feng, Ruili
    Zhu, Kai
    Liu, Jiawei
    Zhao, Deli
    Zha, Zheng-Jun
    Chen, Wei
    Shen, Yujun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11629 - 11639
  • [50] The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
    Zhao, Qinyu
    Xu, Ming
    Gupta, Kartik
    Asthana, Akshay
    Zheng, Liang
    Gould, Stephen
    COMPUTER VISION - ECCV 2024, PT XLVIII, 2025, 15106 : 127 - 142