Continual Learning with Confidence-based Multi-teacher Knowledge Distillation for Neural Machine Translation

被引:0
|
作者
Guo, Jiahua [1 ]
Liang, Yunlong [1 ]
Xu, Jinan [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing, Peoples R China
来源
2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024 | 2024年
基金
国家重点研发计划;
关键词
neural machine translation; continual learning; knowledge distillation;
D O I
10.1109/ICNLP60986.2024.10692378
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Continual learning is widely used in practical applications of neural machine translation, which aims to not only achieve good performance on new domains but also preserve the knowledge of previously learned domains. However, the existing methods for continual learning usually suffer from catastrophic forgetting in the multi-domain continuous learning scenario. When the model is trained on multiple diverse domains one by one, the performance on the previous domains will decrease drastically. In this work, we propose a multi-teacher knowledge distillation technique to alleviate the problem of catastrophic forgetting systematically. Firstly, we adopted the multi-teacher knowledge method, where models from previous training stages are used as teachers. Secondly, to further effectively enhance the performance of previous domains, we propose a confidence-based integration mechanism in which multiple teachers are integrated with sample-adaptive weights based on their performance. We conduct experiments in the settings of multi-domain continual learning, where the pre-trained model is transferred to five diverse domains (IT, Law, Medical, Subtitles, Koran) sequentially. Experimental results show that the proposed method achieves superior performance compared to several strong baseline methods.
引用
收藏
页码:336 / 343
页数:8
相关论文
共 50 条
  • [21] Multi-teacher knowledge distillation for debiasing recommendation with uniform data
    Yang, Xinxin
    Li, Xinwei
    Liu, Zhen
    Yuan, Yafan
    Wang, Yannan
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 273
  • [22] ATMKD: adaptive temperature guided multi-teacher knowledge distillation
    Lin, Yu-e
    Yin, Shuting
    Ding, Yifeng
    Liang, Xingzhu
    MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [23] Continual Learning for Neural Machine Translation
    Cao, Yue
    Wei, Hao-Ran
    Chen, Boxing
    Wan, Xiaojun
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3964 - 3974
  • [24] Reinforced Multi-teacher Knowledge Distillation for Unsupervised Sentence Representation
    Wang, Xintao
    Jin, Rize
    Qi, Shibo
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VII, 2024, 15022 : 320 - 332
  • [25] A multi-graph neural group recommendation model with meta-learning and multi-teacher distillation
    Zhou, Weizhen
    Huang, Zhenhua
    Wang, Cheng
    Chen, Yunwen
    KNOWLEDGE-BASED SYSTEMS, 2023, 276
  • [26] MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution
    Jiang, Yuxuan
    Feng, Chen
    Zhang, Fan
    Bull, David
    COMPUTER VISION - ECCV 2024, PT XXXIX, 2025, 15097 : 364 - 382
  • [27] PAMK: Prototype Augmented Multi-Teacher Knowledge Transfer Network for Continual Zero-Shot Learning
    Lu, Junxin
    Sun, Shiliang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 3353 - 3368
  • [28] Visual emotion analysis using skill-based multi-teacher knowledge distillation
    Cladiere, Tristan
    Alata, Olivier
    Ducottet, Christophe
    Konik, Hubert
    Legrand, Anne-Claire
    PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (02)
  • [29] mKDNAD: A network flow anomaly detection method based on multi-teacher knowledge distillation
    Yang, Yang
    Liu, Dan
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 314 - 319
  • [30] Continual Learning with Semi-supervised Contrastive Distillation for Incremental Neural Machine Translation
    Liang, Yunlong
    Meng, Fandong
    Wang, Jiaan
    Xu, Jinan
    Chen, Yufeng
    Zhou, Jie
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 10914 - 10928