Adapt Without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models

被引:0
|
作者
Zheng, Mengyu [1 ,2 ]
Tang, Yehui [2 ]
Hao, Zhiwei [2 ,3 ]
Hang, Kai [2 ]
Wang, Yunhe [2 ]
Xu, Chang [1 ]
机构
[1] Univ Sydney, Fac Engn, Sch Comp Sci, Sydney, NSW, Australia
[2] Huawei Noahs Ark Lab, Montreal, PQ, Canada
[3] Beijing Inst Technol, Sch Informat & Elect, Beijing, Peoples R China
来源
COMPUTER VISION - ECCV 2024, PT LIV | 2025年 / 15112卷
基金
澳大利亚研究理事会;
关键词
Multi-modal model; Continual learning; Zero-shot learning; Graph-based distillation;
D O I
10.1007/978-3-031-72949-2_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal models such as CLIP possess remarkable zero-shot transfer capabilities, making them highly effective in continual learning tasks. However, this advantage is severely compromised by catastrophic forgetting, which undermines the valuable zero-shot learning abilities of these models. Existing methods predominantly focus on preserving zero-shot capabilities but often fall short in fully exploiting the rich modal information inherent in multi-modal models. In this paper, we propose a strategy to enhance both the zero-shot transfer ability and adaptability to new data distribution. We introduce a novel graph-based multi-modal proximity distillation approach that preserves the intra- and inter-modal information for visual and textual modalities. This approach is further enhanced with a sample re-weighting mechanism, dynamically adjusting the influence of teachers for each individual sample. Experimental results demonstrate a considerable improvement over existing methodologies, which illustrate the effectiveness of the proposed method in the field of continual learning. Code is available at github.com/myz-ah/AwoForget.
引用
收藏
页码:109 / 125
页数:17
相关论文
共 50 条
  • [1] Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
    Yu, Yu-Chu
    Huang, Chi-Pin
    Chen, Jr-Jen
    Chang, Kai-Po
    Lai, Yung-Hsuan
    Yang, Fu-En
    Wang, Yu-Chiang Frank
    COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 219 - 236
  • [2] GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
    Li, Xin
    Lian, Dongze
    Lu, Zhihe
    Bai, Jiawang
    Chen, Zhibo
    Wang, Xinchao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Vision-Language Models for Vision Tasks: A Survey
    Zhang, Jingyi
    Huang, Jiaxing
    Jin, Sheng
    Lu, Shijian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644
  • [4] Learning to Prompt for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
  • [5] Vision-Language Models for Biomedical Applications
    Thapa, Surendrabikram
    Naseem, Usman
    Zhou, Luping
    Kim, Jinman
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON VISION-LANGUAGE MODELS FOR BIOMEDICAL APPLICATIONS, VLM4BIO 2024, 2024, : 1 - 2
  • [6] Learning to Prompt for Vision-Language Models
    Kaiyang Zhou
    Jingkang Yang
    Chen Change Loy
    Ziwei Liu
    International Journal of Computer Vision, 2022, 130 : 2337 - 2348
  • [7] The Neglected Tails in Vision-Language Models
    Parashar, Shubham
    Lin, Zhiqiu
    Liu, Tian
    Dong, Xiangjue
    Li, Yanan
    Ramanan, Deva
    Caverlee, James
    Kong, Shu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 12988 - 12997
  • [8] VISION-LANGUAGE MODELS AS SUCCESS DETECTORS
    Du, Yuqing
    Konyushkova, Ksenia
    Denil, Misha
    Raju, Akhil
    Landon, Jessica
    Hill, Felix
    de Freitas, Nando
    Cabi, Serkan
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 120 - 136
  • [9] Debiasing vision-language models for vision tasks: a survey
    Zhu, Beier
    Zhang, Hanwang
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (01)
  • [10] Contrastive Region Guidance: Improving Grounding in Vision-Language Models Without Training
    Wan, David
    Cho, Jaemin
    Stengel-Eskin, Elias
    Bansal, Mohit
    COMPUTER VISION - ECCV 2024, PT LXXIX, 2025, 15137 : 198 - 215