Adapt Without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models

被引：0

作者：

Zheng, Mengyu ^{[1
,2
]}

Tang, Yehui ^{[2
]}

Hao, Zhiwei ^{[2
,3
]}

Hang, Kai ^{[2
]}

Wang, Yunhe ^{[2
]}

Xu, Chang ^{[1
]}

机构：

[1] Univ Sydney, Fac Engn, Sch Comp Sci, Sydney, NSW, Australia

[2] Huawei Noahs Ark Lab, Montreal, PQ, Canada

[3] Beijing Inst Technol, Sch Informat & Elect, Beijing, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT LIV | 2025年 / 15112卷

基金：

澳大利亚研究理事会;

关键词：

Multi-modal model; Continual learning; Zero-shot learning; Graph-based distillation;

D O I：

10.1007/978-3-031-72949-2_7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-modal models such as CLIP possess remarkable zero-shot transfer capabilities, making them highly effective in continual learning tasks. However, this advantage is severely compromised by catastrophic forgetting, which undermines the valuable zero-shot learning abilities of these models. Existing methods predominantly focus on preserving zero-shot capabilities but often fall short in fully exploiting the rich modal information inherent in multi-modal models. In this paper, we propose a strategy to enhance both the zero-shot transfer ability and adaptability to new data distribution. We introduce a novel graph-based multi-modal proximity distillation approach that preserves the intra- and inter-modal information for visual and textual modalities. This approach is further enhanced with a sample re-weighting mechanism, dynamically adjusting the influence of teachers for each individual sample. Experimental results demonstrate a considerable improvement over existing methodologies, which illustrate the effectiveness of the proposed method in the field of continual learning. Code is available at github.com/myz-ah/AwoForget.

引用

页码：109 / 125

页数：17

共 50 条

[1] Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
Yu, Yu-Chu
Huang, Chi-Pin
Chen, Jr-Jen
Chang, Kai-Po
Lai, Yung-Hsuan
Yang, Fu-En
Wang, Yu-Chiang Frank
COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 219 - 236
[2] GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
Li, Xin
Lian, Dongze
Lu, Zhihe
Bai, Jiawang
Chen, Zhibo
Wang, Xinchao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[3] Vision-Language Models for Vision Tasks: A Survey
Zhang, Jingyi
Huang, Jiaxing
Jin, Sheng
Lu, Shijian
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644
[4] Learning to Prompt for Vision-Language Models
Zhou, Kaiyang
Yang, Jingkang
Loy, Chen Change
Liu, Ziwei
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
[5] Vision-Language Models for Biomedical Applications
Thapa, Surendrabikram
Naseem, Usman
Zhou, Luping
Kim, Jinman
PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON VISION-LANGUAGE MODELS FOR BIOMEDICAL APPLICATIONS, VLM4BIO 2024, 2024, : 1 - 2
[6] Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
International Journal of Computer Vision, 2022, 130 : 2337 - 2348
[7] The Neglected Tails in Vision-Language Models
Parashar, Shubham
Lin, Zhiqiu
Liu, Tian
Dong, Xiangjue
Li, Yanan
Ramanan, Deva
Caverlee, James
Kong, Shu
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 12988 - 12997
[8] VISION-LANGUAGE MODELS AS SUCCESS DETECTORS
Du, Yuqing
Konyushkova, Ksenia
Denil, Misha
Raju, Akhil
Landon, Jessica
Hill, Felix
de Freitas, Nando
Cabi, Serkan
CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 120 - 136
[9] Debiasing vision-language models for vision tasks: a survey
Zhu, Beier
Zhang, Hanwang
FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (01)
[10] Contrastive Region Guidance: Improving Grounding in Vision-Language Models Without Training
Wan, David
Cho, Jaemin
Stengel-Eskin, Elias
Bansal, Mohit
COMPUTER VISION - ECCV 2024, PT LXXIX, 2025, 15137 : 198 - 215

← 1 2 3 4 5 →