Continual Learning with Confidence-based Multi-teacher Knowledge Distillation for Neural Machine Translation

被引:0
|
作者
Guo, Jiahua [1 ]
Liang, Yunlong [1 ]
Xu, Jinan [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing, Peoples R China
来源
2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024 | 2024年
基金
国家重点研发计划;
关键词
neural machine translation; continual learning; knowledge distillation;
D O I
10.1109/ICNLP60986.2024.10692378
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Continual learning is widely used in practical applications of neural machine translation, which aims to not only achieve good performance on new domains but also preserve the knowledge of previously learned domains. However, the existing methods for continual learning usually suffer from catastrophic forgetting in the multi-domain continuous learning scenario. When the model is trained on multiple diverse domains one by one, the performance on the previous domains will decrease drastically. In this work, we propose a multi-teacher knowledge distillation technique to alleviate the problem of catastrophic forgetting systematically. Firstly, we adopted the multi-teacher knowledge method, where models from previous training stages are used as teachers. Secondly, to further effectively enhance the performance of previous domains, we propose a confidence-based integration mechanism in which multiple teachers are integrated with sample-adaptive weights based on their performance. We conduct experiments in the settings of multi-domain continual learning, where the pre-trained model is transferred to five diverse domains (IT, Law, Medical, Subtitles, Koran) sequentially. Experimental results show that the proposed method achieves superior performance compared to several strong baseline methods.
引用
收藏
页码:336 / 343
页数:8
相关论文
共 50 条
  • [1] Multi-Teacher Distillation With Single Model for Neural Machine Translation
    Liang, Xiaobo
    Wu, Lijun
    Li, Juntao
    Qin, Tao
    Zhang, Min
    Liu, Tie-Yan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 992 - 1002
  • [2] CONFIDENCE-AWARE MULTI-TEACHER KNOWLEDGE DISTILLATION
    Zhang, Hailin
    Chen, Defang
    Wang, Can
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4498 - 4502
  • [3] Continual Knowledge Distillation for Neural Machine Translation
    Zhang, Yuanchi
    Li, Peng
    Sun, Maosong
    Liu, Yang
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7978 - 7996
  • [4] Decoupled Multi-teacher Knowledge Distillation based on Entropy
    Cheng, Xin
    Tang, Jialiang
    Zhang, Zhiqiang
    Yu, Wenxin
    Jiang, Ning
    Zhou, Jinjia
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [5] Anomaly detection based on multi-teacher knowledge distillation
    Ma, Ye
    Jiang, Xu
    Guan, Nan
    Yi, Wang
    JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 138
  • [6] Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning
    Zhang, Hailin
    Chen, Defang
    Wang, Can
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1943 - 1948
  • [7] Correlation Guided Multi-teacher Knowledge Distillation
    Shi, Luyao
    Jiang, Ning
    Tang, Jialiang
    Huang, Xinlei
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT IV, 2024, 14450 : 562 - 574
  • [8] Reinforced Multi-Teacher Selection for Knowledge Distillation
    Yuan, Fei
    Shou, Linjun
    Pei, Jian
    Lin, Wutao
    Gong, Ming
    Fu, Yan
    Jiang, Daxin
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14284 - 14291
  • [9] Multi-teacher knowledge distillation for compressed video action recognition based on deep learning
    Wu, Meng-Chieh
    Chiu, Ching-Te
    JOURNAL OF SYSTEMS ARCHITECTURE, 2020, 103
  • [10] Knowledge Distillation via Multi-Teacher Feature Ensemble
    Ye, Xin
    Jiang, Rongxin
    Tian, Xiang
    Zhang, Rui
    Chen, Yaowu
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 566 - 570