Language Model Pre-training Method in Machine Translation Based on Named Entity Recognition

被引:9
作者
Li, Zhen [1 ]
Qu, Dan [1 ]
Xie, Chaojie [2 ]
Zhang, Wenlin [1 ]
Li, Yanxia [3 ]
机构
[1] PLA Strateg Support Force Informat Engn Univ, Informat Syst Engn Coll, 93 Hightech Zone, Zhengzhou 450000, Peoples R China
[2] Zhengzhou Xinda Inst Adv Technol, 93 Hightech Zone, Zhengzhou 450000, Peoples R China
[3] PLA Strateg Support Force Informat Engn Univ, Foreign Languages Coll, 93 Hightech Zone, Zhengzhou 450000, Peoples R China
关键词
Unsupervised machine translation; language model; named entity recognition;
D O I
10.1142/S0218213020400217
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural Machine Translation (NMT) model has become the mainstream technology in machine translation. The supervised neural machine translation model trains with abundant of sentence-level parallel corpora. But for low-resources language or dialect with no such corpus available, it is difficult to achieve good performance. Researchers began to focus on unsupervised neural machine translation (UNMT) that monolingual corpus as training data. UNMT need to construct the language model (LM) which learns semantic information from the monolingual corpus. This paper focuses on the pre-training of LM in unsupervised machine translation and proposes a pre-training method, NER-MLM (named entity recognition masked language model). Through performing NER, the proposed method can obtain better semantic information and language model parameters with better training results. In the unsupervised machine translation task, the BLEU scores on the WMT'16 English-French, English-German, data sets are 35.30, 27.30 respectively. To the best of our knowledge, this is the highest results in the field of UNMT reported so far.
引用
收藏
页数:10
相关论文
共 28 条
[1]  
Alias I., 2013, LINGPIPE 4 1 0
[2]  
[Anonymous], 2019, CORR
[3]  
[Anonymous], ARXIV160704606
[4]   A neural probabilistic language model [J].
Bengio, Y ;
Ducharme, R ;
Vincent, P ;
Jauvin, C .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) :1137-1155
[5]  
Bengio Y, 2001, ADV NEUR IN, V13, P932
[6]  
Dai AM, 2015, ADV NEUR IN, V28
[7]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]  
Finkel J. R., 2005, 43 ANN M ACL 2005
[9]  
Ganin Y, 2016, J MACH LEARN RES, V17
[10]  
Gehring J., 2017, FACEBOOK AI RES