Continual Learning for Neural Machine Translation

被引:0
作者
Cao, Yue [1 ,2 ,3 ]
Wei, Hao-Ran [4 ]
Chen, Boxing [4 ]
Wan, Xiaojun [1 ,2 ,3 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
[2] Peking Univ, Ctr Data Sci, Beijing, Peoples R China
[3] Peking Univ, MOE Key Lab Computat Linguist, Beijing, Peoples R China
[4] Alibaba Grp, Hangzhou, Peoples R China
来源
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021) | 2021年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural machine translation (NMT) models are data-driven and require large-scale training corpus. In practical applications, NMT models are usually trained on a general domain corpus and then fine-tuned by continuing training on the in-domain corpus. However, this bears the risk of catastrophic forgetting that the performance on the general domain is decreased drastically. In this work, we propose a new continual learning framework for NMT models. We consider a scenario where the training is comprised of multiple stages and propose a dynamic knowledge distillation technique to alleviate the problem of catastrophic forgetting systematically. We also find that the bias exists in the output linear projection when fine-tuning on the in-domain corpus, and propose a bias-correction module to eliminate the bias. We conduct experiments on three representative settings of NMT application. Experimental results show that the proposed method achieves superior performance compared to baseline models in all settings.(1)
引用
收藏
页码:3964 / 3974
页数:11
相关论文
共 28 条
[1]  
Aharoni Roee, 2020, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, P7747, DOI [DOI 10.18653/V1/2020.ACLMAIN.692, 10.18653/v1/2020.acl-main.692]
[2]   Memory Aware Synapses: Learning What (not) to Forget [J].
Aljundi, Rahaf ;
Babiloni, Francesca ;
Elhoseiny, Mohamed ;
Rohrbach, Marcus ;
Tuytelaars, Tinne .
COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 :144-161
[3]   Expert Gate: Lifelong Learning with a Network of Experts [J].
Aljundi, Rahaf ;
Chakravarty, Punarjay ;
Tuytelaars, Tinne .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :7120-7129
[4]  
[Anonymous], 2007, Proceedings of the 45th Annual Meeting of the ACL, demonstration session
[5]  
Brown P. F., 1990, Computational Linguistics, V16, P79
[6]  
Freitag Markus, 2016, CoRR
[7]  
French Robert M., 1993, ADV NEURAL INFORM PR, P1176
[8]  
Goyal P., 2017, CoRR
[9]   Learning a Unified Classifier Incrementally via Rebalancing [J].
Hou, Saihui ;
Pan, Xinyu ;
Loy, Chen Change ;
Wang, Zilei ;
Lin, Dahua .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :831-839
[10]  
Hou Saihui, 2018, LECT NOTES COMPUTE 3, V11207, P452