Learning bilingual word embeddings with (almost) no bilingual data

被引:249
作者
Artetxe, Mikel [1 ]
Labaka, Gorka [1 ]
Agirre, Eneko [1 ]
机构
[1] Univ Basque Country UPV EHU, IXA NLP Grp, Leioa, Spain
来源
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1 | 2017年
关键词
D O I
10.18653/v1/P17-1042
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Most methods to learn bilingual word embeddings rely on large parallel corpora, which is difficult to obtain for most language pairs. This has motivated an active research line to relax this requirement, with methods that use document-aligned corpora or bilingual dictionaries of a few thousand words instead. In this work, we further reduce the need of bilingual resources using a very simple self-learning approach that can be combined with any dictionary-based mapping technique. Our method exploits the structural similarity of embedding spaces, and works with as little bilingual evidence as a 25 word dictionary or even an automatically generated list of numerals, obtaining results comparable to those of systems that use richer resources.
引用
收藏
页码:451 / 462
页数:12
相关论文
共 33 条
  • [1] [Anonymous], 2015, P 3 INT C LEARN REPR
  • [2] [Anonymous], 2013, COMPUTER SCI
  • [3] [Anonymous], 2010, Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
  • [4] [Anonymous], 2013, Bilingual word embeddings for phrasebased machine translation
  • [5] [Anonymous], 2017, P 5 INT C LEARN REPR
  • [6] Artetxe Mikel, 2016, P 2016 C EMPIRICAL M, P2289, DOI [DOI 10.18653/V1/D16-1250, 10.18653/v1/D16-1250]
  • [7] Barone A. V. M., 2016, P 1 WORKSH REPR LEAR, P121, DOI [10.18653/v1/W16-1614, DOI 10.18653/V1/W16-1614]
  • [8] The WaCky wide web: a collection of very large linguistically processed web-crawled corpora
    Baroni, Marco
    Bernardini, Silvia
    Ferraresi, Adriano
    Zanchetta, Eros
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2009, 43 (03) : 209 - 226
  • [9] Camacho-Collados J, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, P1
  • [10] Cao H., 2016, P COLING 2016 26 INT, P1818