On the Limitations of Unsupervised Bilingual Dictionary Induction

被引:0
|
作者
Sogaard, Anders [1 ]
Ruder, Sebastian [2 ,3 ]
Vulic, Ivan [4 ]
机构
[1] Univ Copenhagen, Copenhagen, Denmark
[2] Natl Univ Ireland, Insight Res Ctr, Galway, Ireland
[3] Aylien Ltd, Dublin, Ireland
[4] Univ Cambridge, Language Technol Lab, Cambridge, England
来源
PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1 | 2018年
基金
爱尔兰科学基金会;
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Unsupervised machine translation-i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora-seems impossible, but nevertheless, Lample et al. (2018a) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised alignment of word embedding spaces for bilingual dictionary induction (Conneau et al., 2018), which we examine here. Our results identify the limitations of current unsupervised MT: unsupervised bilingual dictionary induction performs much worse on morphologically rich languages that are not dependent marking, when monolingual corpora from different domains or different embedding algorithms are used. We show that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction, and establish a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.
引用
收藏
页码:778 / 788
页数:11
相关论文
共 50 条
  • [21] CONTRASTIVE DICTIONARY OF PORTUGUESE AND SPANISH (DICOPOES) IN THE BILINGUAL LEXICOGRAPHY OF PORTUGUESE AND SPANISH: CONTRIBUTIONS, LIMITATIONS AND EXPECTATIONS
    Sastre Ruano, Ma. Angeles
    CADERNOS DE TRADUCAO, 2013, 32 (02): : 39 - 56
  • [22] A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families
    Nasution, Arbi Haza
    Murakami, Yohei
    Ishida, Toru
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2018, 17 (02)
  • [23] A Graph-based Coarse-to-fine Method for Unsupervised Bilingual Lexicon Induction
    Ren, Shuo
    Liu, Shujie
    Zhou, Ming
    Ma, Shuai
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3476 - 3485
  • [24] WHAT BELONGS IN A BILINGUAL DICTIONARY
    HAAS, MR
    INTERNATIONAL JOURNAL OF AMERICAN LINGUISTICS, 1962, 28 (02) : 45 - 50
  • [25] The intercultural dimension of the bilingual dictionary
    Farina, Annick
    JOURNAL OF FRENCH LANGUAGE STUDIES, 2018, 28 (03) : 457 - 459
  • [26] Bilingual Dictionary In The Linguocultural Aspect
    Mardanova, Gulnaz, I
    Karimullina, Guzel N.
    Karimullina, Rezeda N.
    Sarekenova, Karlygash K.
    MODERN JOURNAL OF LANGUAGE TEACHING METHODS, 2018, 8 (11): : 108 - 112
  • [27] The Intercultural dimension of the bilingual dictionary
    Kacprzak, Alicja
    FRANCAIS MODERNE, 2019, 87 (01): : 139 - 142
  • [28] How is the bilingual dictionary possible?
    Adamska-Salaciak, Arleta
    INTERNATIONAL JOURNAL OF LEXICOGRAPHY, 2008, 21 (04) : 439 - 446
  • [29] ABOUT IMPLICIT IN BILINGUAL DICTIONARY
    Berkov, V. P.
    SKANDINAVSKAYA FILOLOGIYA, 2007, (09): : 25 - 37
  • [30] Bilingual Dictionary of Legal terminology
    Alcaraz-Varo, Enrique
    QUADERNS-REVISTA DE TRADUCCIO, 2006, 13 : 217 - 219