Continual Learning with Confidence-based Multi-teacher Knowledge Distillation for Neural Machine Translation

被引:0
|
作者
Guo, Jiahua [1 ]
Liang, Yunlong [1 ]
Xu, Jinan [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing, Peoples R China
来源
2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024 | 2024年
基金
国家重点研发计划;
关键词
neural machine translation; continual learning; knowledge distillation;
D O I
10.1109/ICNLP60986.2024.10692378
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Continual learning is widely used in practical applications of neural machine translation, which aims to not only achieve good performance on new domains but also preserve the knowledge of previously learned domains. However, the existing methods for continual learning usually suffer from catastrophic forgetting in the multi-domain continuous learning scenario. When the model is trained on multiple diverse domains one by one, the performance on the previous domains will decrease drastically. In this work, we propose a multi-teacher knowledge distillation technique to alleviate the problem of catastrophic forgetting systematically. Firstly, we adopted the multi-teacher knowledge method, where models from previous training stages are used as teachers. Secondly, to further effectively enhance the performance of previous domains, we propose a confidence-based integration mechanism in which multiple teachers are integrated with sample-adaptive weights based on their performance. We conduct experiments in the settings of multi-domain continual learning, where the pre-trained model is transferred to five diverse domains (IT, Law, Medical, Subtitles, Koran) sequentially. Experimental results show that the proposed method achieves superior performance compared to several strong baseline methods.
引用
收藏
页码:336 / 343
页数:8
相关论文
共 50 条
  • [31] Named Entity Recognition Method Based on Multi-Teacher Collaborative Cyclical Knowledge Distillation
    Jin, Chunqiao
    Yang, Shuangyuan
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 230 - 235
  • [32] MTKDSR: Multi-Teacher Knowledge Distillation for Super Resolution Image Reconstruction
    Yao, Gengqi
    Li, Zhan
    Bhanu, Bir
    Kang, Zhiqing
    Zhong, Ziyi
    Zhang, Qingfeng
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 352 - 358
  • [33] Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation
    Cao, Shengcao
    Li, Mengtian
    Hays, James
    Ramanan, Deva
    Wang, Yu-Xiong
    Gui, Liang-Yan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [34] MulDE: Multi-teacher Knowledge Distillation for Low-dimensional Knowledge Graph Embeddings
    Wang, Kai
    Liu, Yu
    Ma, Qian
    Sheng, Quan Z.
    PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 1716 - 1726
  • [35] Accurate and efficient protein embedding using multi-teacher distillation learning
    Shang, Jiayu
    Peng, Cheng
    Ji, Yongxin
    Guan, Jiaojiao
    Cai, Dehan
    Tang, Xubo
    Sun, Yanni
    BIOINFORMATICS, 2024, 40 (09)
  • [36] Cross-View Gait Recognition Method Based on Multi-Teacher Joint Knowledge Distillation
    Li, Ruoyu
    Yun, Lijun
    Zhang, Mingxuan
    Yang, Yanchen
    Cheng, Feiyan
    SENSORS, 2023, 23 (22)
  • [37] Dual Knowledge Distillation for neural machine translation
    Wan, Yuxian
    Zhang, Wenlin
    Li, Zhen
    Zhang, Hao
    Li, Yanxia
    COMPUTER SPEECH AND LANGUAGE, 2024, 84
  • [38] Continual Learning Based on Knowledge Distillation and Representation Learning
    Chen, Xiu-Yan
    Liu, Jian-Wei
    Li, Wen-Tao
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 27 - 38
  • [39] Learning Semantic Textual Similarity via Multi-Teacher Knowledge Distillation: A Multiple Data Augmentation method
    Lu, Zhikun
    Zhao, Ying
    Li, Jinnan
    Tian, Yuan
    2024 9TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS, ICCCS 2024, 2024, : 1197 - 1203
  • [40] CIMTD: Class Incremental Multi-Teacher Knowledge Distillation for Fractal Object Detection
    Wu, Chuhan
    Luo, Xiaochuan
    Huang, Haoran
    Zhang, Yulin
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XII, 2025, 15042 : 51 - 65