A Unified and Unsupervised Framework for Bilingual Phrase Alignment on Specialized Comparable Corpora

被引：0

作者：

Liu, Jingshu ^{[1
,2
]}

Morin, Emmanuel ^{[1
]}

Saldarriaga, Sebastian Pena ^{[2
]}

Lark, Joseph ^{[2
]}

机构：

[1] Univ Nantes, UMR CNRS 6004, LS2N, Nantes, France

[2] Dictanova, Nantes, France

来源：

ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2020年 / 325卷

关键词：

D O I：

10.3233/FAIA200332

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Significant advances have been achieved in bilingual word-level alignment, yet the challenge remains for phrase-level alignment. Moreover, the need for parallel data is a critical drawback for the alignment task. In particular, this makes multi-word terms very difficult to align in specialized domains. This work proposes a system that alleviates these two problems: a unified phrase representation model using cross-lingual word embeddings as input, and an unsupervised training algorithm inspired by recent works on neural machine translation. The system consists of a sequence-to-sequence architecture where a short sequence encoder constructs cross-lingual representations of phrases of any length, then an LSTM network decodes them w.r.t their contexts. After training, our encoder provides cross-lingual phrase representations that can be compared without further transformation. Experiments on five specialized domain datasets show that our method obtains state-of-the-art results on the bilingual phrase alignment task, and improves the results of different length phrase alignment by a mean of 8.8 points in MAP.

引用

页码：2093 / 2100

页数：8

共 52 条

[1] [Anonymous], 2016, ARXIV160604164
[2] [Anonymous], P 6 INT C LEARN REPR
[3] [Anonymous], 2019, ARXIV190107291
[4] [Anonymous], 2010, Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop, DOI DOI 10.1007/978-3-540-87479-9
[5] [Anonymous], 2018, INT C LEARNING REPRE
[6] [Anonymous], 2017, P 5 INT C LEARN REPR
[7] Artetxe M, 2018, AAAI CONF ARTIF INTE, P5012
[8] Artetxe Mikel, 2016, P EMNLP, P2289, DOI 10.18653/v1/D16-1250
[9] Bahdanau D., 2014, 3 INT C LEARN REPR
[10] A Teacher-Student Framework for Zero-Resource Neural Machine Translation
Chen, Yun
Liu, Yang
Cheng, Yong
Li, Victor O. K.
[J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1925 - 1935

← 1 2 3 4 5 6 →