Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models

被引:0
作者
Tang, Longxiang [1 ]
Tian, Zhuotao [4 ]
Li, Kai [5 ]
He, Chunming [1 ]
Zhou, Hantao [1 ]
Zhao, Hengshuang [6 ]
Li, Xiu [1 ]
Jia, Jiaya [2 ,3 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] SmartMore, Hong Kong, Peoples R China
[3] CUHK, Hong Kong, Peoples R China
[4] HIT SZ, Shenzhen, Peoples R China
[5] Meta Real Labs, Menlo Pk, CA USA
[6] HKU, Hong Kong, Peoples R China
来源
COMPUTER VISION - ECCV 2024, PT XXXVI | 2025年 / 15094卷
关键词
D O I
10.1007/978-3-031-72764-1_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study addresses the Domain-Class Incremental Learning problem, a realistic but challenging continual learning scenario where both the domain distribution and target classes vary across tasks. To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability. However, this incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability. Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy computation overhead. To address this problem efficiently, we propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of VLMs from a perspective of avoiding information interference. Specifically, we design a fully residual mechanism to infuse newly learned knowledge into a frozen backbone, while introducing minimal adverse impacts on pre-trained knowledge. Besides, this residual property enables our distribution-aware integration calibration scheme, explicitly controlling the information implantation process for test data from unseen distributions. Experiments demonstrate that our DIKI surpasses the current state-of-the-art approach using only 0.86% of the trained parameters and requiring substantially less training time. Code is available at: https://github.com/lloongx/DIKI.
引用
收藏
页码:346 / 365
页数:20
相关论文
共 95 条
[1]  
Ahn H, 2019, ADV NEUR IN, V32
[2]   Memory Aware Synapses: Learning What (not) to Forget [J].
Aljundi, Rahaf ;
Babiloni, Francesca ;
Elhoseiny, Mohamed ;
Rohrbach, Marcus ;
Tuytelaars, Tinne .
COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 :144-161
[3]  
Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
[4]   A-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting [J].
Bowman, Benjamin ;
Achille, Alessandro ;
Zancato, Luca ;
Trager, Matthew ;
Perera, Pramuditha ;
Paolini, Giovanni ;
Soatto, Stefano .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :14984-14993
[5]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[6]  
Chen SF, 2022, ADV NEUR IN
[7]   Describing Textures in the Wild [J].
Cimpoi, Mircea ;
Maji, Subhransu ;
Kokkinos, Iasonas ;
Mohamed, Sammy ;
Vedaldi, Andrea .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3606-3613
[8]   A Continual Learning Survey: Defying Forgetting in Classification Tasks [J].
De Lange, Matthias ;
Aljundi, Rahaf ;
Masana, Marc ;
Parisot, Sarah ;
Jia, Xu ;
Leonardis, Ales ;
Slabaugh, Greg ;
Tuytelaars, Tinne .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) :3366-3385
[9]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10]   The MNIST database of handwritten digit images for machine learning research [J].
Deng, Li .
IEEE Signal Processing Magazine, 2012, 29 (06) :141-142