Towards Unified Multi-Domain Machine Translation With Mixture of Domain Experts

被引:0
作者
Lu, Jinliang [1 ,2 ]
Zhang, Jiajun [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[3] Wuhan AI Res, Wuhan 430072, Peoples R China
基金
国家重点研发计划;
关键词
Training; Adaptation models; Transformers; Task analysis; Speech processing; Machine translation; Switches; Machine Translation; Multi-domain; Mixture-of-expert;
D O I
10.1109/TASLP.2023.3316451
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
machine translation (MDMT) aims to construct models with mixed-domain training corpora to switch translation between different domains. Previous studies either assume that the domain information is given and leverage the domain knowledge to guide the translation process, or suppose that the domain information is unknown and utilize the model to automatically recognize it. However, the cases are mixed in practical scenarios, which means that some sentences are labeled with domain information while others are unlabeled, which is beyond the capacity of the previous methods. In this article, we propose a unified MDMT model with a mixture of sub-networks (experts) to address the cases with or without domain labels. The mixture of sub-networks in our MDMT model includes a shared expert and multiple domain-specific experts. For the inputs with domain labels, our MDMT model goes through the shared and the corresponding domain-specific experts. For the unlabeled inputs, our MDMT model activates all the experts, each of which makes a dynamic contribution. Experimental results on multiple diverse domains in De -> En, Fr--> En, and En -> Ro demonstrate that our method can outperform the strong baselines in both scenarios with or without domain labels. Further analyses show that our model has good generalization ability when transferring into new domains.
引用
收藏
页码:3488 / 3498
页数:11
相关论文
共 37 条
[1]  
Aghajanyan A., 2021, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, P7319
[2]  
Aharoni Roee, 2020, P 58 ANN M ASS COMPU, P7747, DOI [DOI 10.18653/V1/2020.ACL-MAIN.692, DOI 10.18653/V1/2020.ACLMAIN.692, 10.18653/v1/2020.acl-main.692]
[3]  
[Anonymous], 2017, P 2 C MACH TRANSL
[4]  
[Anonymous], 2018, Proceedings of the 27th International Conference on Computational Linguistics
[5]  
Arivazhagan N., 2019, P 2019 C N AM CHAPTE, V1, P3874
[6]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]
[7]  
Bapna A, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P1921
[8]  
Britz D., 2017, P 2 C MACH TRANSL CO, P118
[9]   An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation [J].
Chu, Chenhui ;
Dabre, Raj ;
Kurohashi, Sadao .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, :385-391
[10]  
Chu Chenhui, 2018, P 24 ANN M ASS NAT L, P909