From unified phrase representation to bilingual phrase alignment in an unsupervised manner

被引:0
作者
Liu, Jingshu [1 ,2 ]
Morin, Emmanuel [1 ,2 ]
Pena Saldarriaga, Sebastian [2 ]
Lark, Joseph [2 ]
机构
[1] Univ Nantes, LS2N, UMR CNRS 6004, Nantes, France
[2] Dictanova, 6 Rue Rene Viviani, F-44200 Nantes, France
关键词
Multilinguality; Phrase representation; Bilingual alignment; Word semantics;
D O I
10.1017/S1351324922000328
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Significant advances have been achieved in bilingual word-level alignment, yet the challenge remains for phrase-level alignment. Moreover, the need for parallel data is a critical drawback for the alignment task. This work proposes a system that alleviates these two problems: a unified phrase representation model using cross-lingual word embeddings as input and an unsupervised training algorithm inspired by recent works on neural machine translation. The system consists of a sequence-to-sequence architecture where a short sequence encoder constructs cross-lingual representations of phrases of any length, then an LSTM network decodes them w.r.t their contexts. After training with comparable corpora and existing key phrase extraction, our encoder provides cross-lingual phrase representations that can be compared without further transformation. Experiments on five data sets show that our method obtains state-of-the-art results on the bilingual phrase alignment task and improves the results of different length phrase alignment by a mean of 8.8 points in MAP.
引用
收藏
页码:643 / 668
页数:26
相关论文
共 82 条
[81]  
Zhang Y., 2016, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, P1307, DOI DOI 10.18653/V1/N16-1156
[82]  
Zuidema W., 2015, P 4 JOINT C LEXICAL, P10, DOI 10.18653/v1/S15-1002