Unsupervised Statistical Machine Translation

被引:0
作者
Artetxe, Mikel [1 ]
Labaka, Gorka [1 ]
Agirre, Eneko [1 ]
机构
[1] Univ Basque Country, UPV EHU, IXA NLP Grp, Bilbao, Spain
来源
2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018) | 2018年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018). Despite the potential of this approach for low-resource settings, existing systems are far behind their supervised counterparts, limiting their practical interest. In this paper, we propose an alternative approach based on phrase-based Statistical Machine Translation (SMT) that significantly closes the gap with supervised systems. Our method profits from the modular architecture of SMT: we first induce a phrase table from monolingual corpora through cross-lingual embedding mappings, combine it with an n-gram language model, and fine-tune hyperparameters through an unsupervised MERT variant. In addition, iterative backtranslation improves results further, yielding, for instance, 14.08 and 26.22 BLEU points in WMT 2014 English-German and English-French, respectively, an improvement of more than 7-10 BLEU points over previous unsupervised systems, and closing the gap with supervised SMT (Moses trained on Europarl) down to 2-5 BLEU points. Our implementation is available at https://github.com/artetxem/monoses.
引用
收藏
页码:3632 / 3642
页数:11
相关论文
共 26 条
  • [1] [Anonymous], 2013, Short Papers
  • [2] Artetxe M., 2018, INT C LEARN REPR, DOI DOI 10.18653/V1/D18-1399
  • [3] Artetxe M, 2018, AAAI CONF ARTIF INTE, P5012
  • [4] Artetxe M, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P789
  • [5] Learning bilingual word embeddings with (almost) no bilingual data
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 451 - 462
  • [6] Brown P. F., 1990, Computational Linguistics, V16, P79
  • [7] Conneau A., 2018, 6 INT C LEARNING REP
  • [8] Dou Q., 2012, P 2012 JOINT C EMP M, P266
  • [9] Dou Q., 2013, P 2013 C EMP METH NA, P1668
  • [10] Dou Q, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P836