Mongolian-Chinese Unsupervised Neural Machine Translation with Lexical Feature

被引：1

作者：

Wu, Ziyu ^{[1
]}

Hou, Hongxu ^{[1
]}

Guo, Ziyue ^{[1
]}

Wang, Xuejiao ^{[1
]}

Sun, Shuo ^{[1
]}

机构：

[1] Inner Mongolia Univ, Dept Comp Sci, Hohhot, Peoples R China

来源：

CHINESE COMPUTATIONAL LINGUISTICS, CCL 2019 | 2019年 / 11856卷

关键词：

Mongolian-Chinese; Neural machine translation; Unsupervised method; Stem-affix segmentation;

D O I：

10.1007/978-3-030-32381-3_27

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Machine translation has achieved impressive performance with the advances in deep learning and rely on large scale parallel corpora. There have been a large number of attempts to extend these successes to low-resource language, yet requiring large parallel sentences. In this study, we build the Mongolian-Chinese neural machine translation model based on unsupervised methods. Cross-lingual word embedding training plays a crucial role in unsupervised machine translation which generative adversarial networks (GANs) training methods only perform well between two closely-related languages, yet the self-learning method can learn high-quality bilingual embedding mappings without any parallel corpora in low-source language. In this work, apply the self-learning method is better than using GANs to improve the BLEU score of 1.0. On this basis, we analyze the Mongolian word lexical features and use stem-affixes segmentation in Mongolian to replace the Bytes-Pair-Encoding (BPE) operation, so that the cross-lingual word embedding training is more accurate, and obtain higher quality bilingual words embedding to enhance translation performance. We reporting BLEU score of 15.2 on the CWMT2017 Mongolian-Chinese dataset, without using any parallel corpora during training.

引用

页码：334 / 345

页数：12

共 19 条

[1]

[Anonymous], 2015, ICLR 2015

[2]

Artetxe M, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P789

[3] Learning bilingual word embeddings with (almost) no bilingual data [J].

Artetxe, Mikel ;

Labaka, Gorka ;

Agirre, Eneko .

PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :451-462

[4]

Artetxe Mikel, 2018, INT C LEARN REPR ICL

[5]

Barone A.V.M., 2016, P 1 WORKSH REPR LEAR, P121, DOI DOI 10.18653/V1/W16-1614

[6]

Fan W., 2018, CHIN J INF SCI, V32, P36

[7]

Gouws S, 2015, PR MACH LEARN RES, V37, P748

[8]

He D, 2016, ADV NEUR IN, V29

[9]

[姜文斌 JIANG Wenbin], 2011, [中文信息学报, Journal of Chinese Information Processing], V25, P30

[10]

Lample G., 2018, INT C LEARN REPR ICL

← 1 2 →