Continual Learning with Confidence-based Multi-teacher Knowledge Distillation for Neural Machine Translation

被引：0

作者：

Guo, Jiahua ^{[1
]}

Liang, Yunlong ^{[1
]}

Xu, Jinan ^{[1
]}

机构：

[1] Beijing Jiaotong Univ, Beijing, Peoples R China

来源：

2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024 | 2024年

基金：

国家重点研发计划;

关键词：

neural machine translation; continual learning; knowledge distillation;

D O I：

10.1109/ICNLP60986.2024.10692378

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Continual learning is widely used in practical applications of neural machine translation, which aims to not only achieve good performance on new domains but also preserve the knowledge of previously learned domains. However, the existing methods for continual learning usually suffer from catastrophic forgetting in the multi-domain continuous learning scenario. When the model is trained on multiple diverse domains one by one, the performance on the previous domains will decrease drastically. In this work, we propose a multi-teacher knowledge distillation technique to alleviate the problem of catastrophic forgetting systematically. Firstly, we adopted the multi-teacher knowledge method, where models from previous training stages are used as teachers. Secondly, to further effectively enhance the performance of previous domains, we propose a confidence-based integration mechanism in which multiple teachers are integrated with sample-adaptive weights based on their performance. We conduct experiments in the settings of multi-domain continual learning, where the pre-trained model is transferred to five diverse domains (IT, Law, Medical, Subtitles, Koran) sequentially. Experimental results show that the proposed method achieves superior performance compared to several strong baseline methods.

引用

页码：336 / 343

页数：8

共 50 条

[21] Multi-teacher knowledge distillation for debiasing recommendation with uniform data
Yang, Xinxin
Li, Xinwei
Liu, Zhen
Yuan, Yafan
Wang, Yannan
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 273
[22] ATMKD: adaptive temperature guided multi-teacher knowledge distillation
Lin, Yu-e
Yin, Shuting
Ding, Yifeng
Liang, Xingzhu
MULTIMEDIA SYSTEMS, 2024, 30 (05)
[23] Continual Learning for Neural Machine Translation
Cao, Yue
Wei, Hao-Ran
Chen, Boxing
Wan, Xiaojun
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3964 - 3974
[24] Reinforced Multi-teacher Knowledge Distillation for Unsupervised Sentence Representation
Wang, Xintao
Jin, Rize
Qi, Shibo
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VII, 2024, 15022 : 320 - 332
[25] A multi-graph neural group recommendation model with meta-learning and multi-teacher distillation
Zhou, Weizhen
Huang, Zhenhua
Wang, Cheng
Chen, Yunwen
KNOWLEDGE-BASED SYSTEMS, 2023, 276
[26] MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution
Jiang, Yuxuan
Feng, Chen
Zhang, Fan
Bull, David
COMPUTER VISION - ECCV 2024, PT XXXIX, 2025, 15097 : 364 - 382
[27] PAMK: Prototype Augmented Multi-Teacher Knowledge Transfer Network for Continual Zero-Shot Learning
Lu, Junxin
Sun, Shiliang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 3353 - 3368
[28] Visual emotion analysis using skill-based multi-teacher knowledge distillation
Cladiere, Tristan
Alata, Olivier
Ducottet, Christophe
Konik, Hubert
Legrand, Anne-Claire
PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (02)
[29] mKDNAD: A network flow anomaly detection method based on multi-teacher knowledge distillation
Yang, Yang
Liu, Dan
2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 314 - 319
[30] Continual Learning with Semi-supervised Contrastive Distillation for Incremental Neural Machine Translation
Liang, Yunlong
Meng, Fandong
Wang, Jiaan
Xu, Jinan
Chen, Yufeng
Zhou, Jie
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 10914 - 10928

← 1 2 3 4 5 →