Mongolian-Chinese Unsupervised Neural Machine Translation with Lexical Feature

被引:1
作者
Wu, Ziyu [1 ]
Hou, Hongxu [1 ]
Guo, Ziyue [1 ]
Wang, Xuejiao [1 ]
Sun, Shuo [1 ]
机构
[1] Inner Mongolia Univ, Dept Comp Sci, Hohhot, Peoples R China
来源
CHINESE COMPUTATIONAL LINGUISTICS, CCL 2019 | 2019年 / 11856卷
关键词
Mongolian-Chinese; Neural machine translation; Unsupervised method; Stem-affix segmentation;
D O I
10.1007/978-3-030-32381-3_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine translation has achieved impressive performance with the advances in deep learning and rely on large scale parallel corpora. There have been a large number of attempts to extend these successes to low-resource language, yet requiring large parallel sentences. In this study, we build the Mongolian-Chinese neural machine translation model based on unsupervised methods. Cross-lingual word embedding training plays a crucial role in unsupervised machine translation which generative adversarial networks (GANs) training methods only perform well between two closely-related languages, yet the self-learning method can learn high-quality bilingual embedding mappings without any parallel corpora in low-source language. In this work, apply the self-learning method is better than using GANs to improve the BLEU score of 1.0. On this basis, we analyze the Mongolian word lexical features and use stem-affixes segmentation in Mongolian to replace the Bytes-Pair-Encoding (BPE) operation, so that the cross-lingual word embedding training is more accurate, and obtain higher quality bilingual words embedding to enhance translation performance. We reporting BLEU score of 15.2 on the CWMT2017 Mongolian-Chinese dataset, without using any parallel corpora during training.
引用
收藏
页码:334 / 345
页数:12
相关论文
共 19 条
[1]  
[Anonymous], 2015, ICLR 2015
[2]  
Artetxe M, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P789
[3]   Learning bilingual word embeddings with (almost) no bilingual data [J].
Artetxe, Mikel ;
Labaka, Gorka ;
Agirre, Eneko .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :451-462
[4]  
Artetxe Mikel, 2018, INT C LEARN REPR ICL
[5]  
Barone A.V.M., 2016, P 1 WORKSH REPR LEAR, P121, DOI DOI 10.18653/V1/W16-1614
[6]  
Fan W., 2018, CHIN J INF SCI, V32, P36
[7]  
Gouws S, 2015, PR MACH LEARN RES, V37, P748
[8]  
He D, 2016, ADV NEUR IN, V29
[9]  
[姜文斌 JIANG Wenbin], 2011, [中文信息学报, Journal of Chinese Information Processing], V25, P30
[10]  
Lample G., 2018, INT C LEARN REPR ICL