Continual Learning with Confidence-based Multi-teacher Knowledge Distillation for Neural Machine Translation

被引：0

作者：

Guo, Jiahua ^{[1
]}

Liang, Yunlong ^{[1
]}

Xu, Jinan ^{[1
]}

机构：

[1] Beijing Jiaotong Univ, Beijing, Peoples R China

来源：

2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024 | 2024年

基金：

国家重点研发计划;

关键词：

neural machine translation; continual learning; knowledge distillation;

D O I：

10.1109/ICNLP60986.2024.10692378

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Continual learning is widely used in practical applications of neural machine translation, which aims to not only achieve good performance on new domains but also preserve the knowledge of previously learned domains. However, the existing methods for continual learning usually suffer from catastrophic forgetting in the multi-domain continuous learning scenario. When the model is trained on multiple diverse domains one by one, the performance on the previous domains will decrease drastically. In this work, we propose a multi-teacher knowledge distillation technique to alleviate the problem of catastrophic forgetting systematically. Firstly, we adopted the multi-teacher knowledge method, where models from previous training stages are used as teachers. Secondly, to further effectively enhance the performance of previous domains, we propose a confidence-based integration mechanism in which multiple teachers are integrated with sample-adaptive weights based on their performance. We conduct experiments in the settings of multi-domain continual learning, where the pre-trained model is transferred to five diverse domains (IT, Law, Medical, Subtitles, Koran) sequentially. Experimental results show that the proposed method achieves superior performance compared to several strong baseline methods.

引用

页码：336 / 343

页数：8

共 50 条

[1] Multi-Teacher Distillation With Single Model for Neural Machine Translation
Liang, Xiaobo
Wu, Lijun
Li, Juntao
Qin, Tao
Zhang, Min
Liu, Tie-Yan
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 992 - 1002
[2] CONFIDENCE-AWARE MULTI-TEACHER KNOWLEDGE DISTILLATION
Zhang, Hailin
Chen, Defang
Wang, Can
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4498 - 4502
[3] Continual Knowledge Distillation for Neural Machine Translation
Zhang, Yuanchi
Li, Peng
Sun, Maosong
Liu, Yang
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7978 - 7996
[4] Decoupled Multi-teacher Knowledge Distillation based on Entropy
Cheng, Xin
Tang, Jialiang
Zhang, Zhiqiang
Yu, Wenxin
Jiang, Ning
Zhou, Jinjia
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[5] Anomaly detection based on multi-teacher knowledge distillation
Ma, Ye
Jiang, Xu
Guan, Nan
Yi, Wang
JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 138
[6] Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning
Zhang, Hailin
Chen, Defang
Wang, Can
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1943 - 1948
[7] Correlation Guided Multi-teacher Knowledge Distillation
Shi, Luyao
Jiang, Ning
Tang, Jialiang
Huang, Xinlei
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT IV, 2024, 14450 : 562 - 574
[8] Reinforced Multi-Teacher Selection for Knowledge Distillation
Yuan, Fei
Shou, Linjun
Pei, Jian
Lin, Wutao
Gong, Ming
Fu, Yan
Jiang, Daxin
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14284 - 14291
[9] Multi-teacher knowledge distillation for compressed video action recognition based on deep learning
Wu, Meng-Chieh
Chiu, Ching-Te
JOURNAL OF SYSTEMS ARCHITECTURE, 2020, 103
[10] Knowledge Distillation via Multi-Teacher Feature Ensemble
Ye, Xin
Jiang, Rongxin
Tian, Xiang
Zhang, Rui
Chen, Yaowu
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 566 - 570

← 1 2 3 4 5 →