From unified phrase representation to bilingual phrase alignment in an unsupervised manner

被引:0
作者
Liu, Jingshu [1 ,2 ]
Morin, Emmanuel [1 ,2 ]
Pena Saldarriaga, Sebastian [2 ]
Lark, Joseph [2 ]
机构
[1] Univ Nantes, LS2N, UMR CNRS 6004, Nantes, France
[2] Dictanova, 6 Rue Rene Viviani, F-44200 Nantes, France
关键词
Multilinguality; Phrase representation; Bilingual alignment; Word semantics;
D O I
10.1017/S1351324922000328
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Significant advances have been achieved in bilingual word-level alignment, yet the challenge remains for phrase-level alignment. Moreover, the need for parallel data is a critical drawback for the alignment task. This work proposes a system that alleviates these two problems: a unified phrase representation model using cross-lingual word embeddings as input and an unsupervised training algorithm inspired by recent works on neural machine translation. The system consists of a sequence-to-sequence architecture where a short sequence encoder constructs cross-lingual representations of phrases of any length, then an LSTM network decodes them w.r.t their contexts. After training with comparable corpora and existing key phrase extraction, our encoder provides cross-lingual phrase representations that can be compared without further transformation. Experiments on five data sets show that our method obtains state-of-the-art results on the bilingual phrase alignment task and improves the results of different length phrase alignment by a mean of 8.8 points in MAP.
引用
收藏
页码:643 / 668
页数:26
相关论文
共 82 条
  • [1] Agerri R, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3823
  • [2] [Anonymous], 2013, P 51 ANN M ASS COMP
  • [3] [Anonymous], 2016, P 2016 C EMP METH NA, DOI 10.18653/v1/D16-1026
  • [4] [Anonymous], 2015, P IEEE C COMPUTER VI, DOI DOI 10.1109/CVPR.2015.7299064
  • [5] [Anonymous], 2016, P 2016 C EMPIRICAL M, DOI DOI 10.18653/V1/D16-1250
  • [6] [Anonymous], 2010, P NIPS 2010 DEEP LE, DOI DOI 10.1007/978-3-540-87479-9
  • [7] [Anonymous], 2014, P 14 C EUR CHAPT ASS, DOI DOI 10.3115/V1/E14-1049
  • [8] Artetxe M, 2018, AAAI CONF ARTIF INTE, P5012
  • [9] Artetxe M, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3632
  • [10] Artetxe Mikel, 2018, 6 INT C LEARN REPR I, DOI DOI 10.18653/V1/D18-1399