A Unified and Unsupervised Framework for Bilingual Phrase Alignment on Specialized Comparable Corpora

被引:0
作者
Liu, Jingshu [1 ,2 ]
Morin, Emmanuel [1 ]
Saldarriaga, Sebastian Pena [2 ]
Lark, Joseph [2 ]
机构
[1] Univ Nantes, UMR CNRS 6004, LS2N, Nantes, France
[2] Dictanova, Nantes, France
来源
ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2020年 / 325卷
关键词
D O I
10.3233/FAIA200332
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Significant advances have been achieved in bilingual word-level alignment, yet the challenge remains for phrase-level alignment. Moreover, the need for parallel data is a critical drawback for the alignment task. In particular, this makes multi-word terms very difficult to align in specialized domains. This work proposes a system that alleviates these two problems: a unified phrase representation model using cross-lingual word embeddings as input, and an unsupervised training algorithm inspired by recent works on neural machine translation. The system consists of a sequence-to-sequence architecture where a short sequence encoder constructs cross-lingual representations of phrases of any length, then an LSTM network decodes them w.r.t their contexts. After training, our encoder provides cross-lingual phrase representations that can be compared without further transformation. Experiments on five specialized domain datasets show that our method obtains state-of-the-art results on the bilingual phrase alignment task, and improves the results of different length phrase alignment by a mean of 8.8 points in MAP.
引用
收藏
页码:2093 / 2100
页数:8
相关论文
共 52 条
  • [1] [Anonymous], 2016, ARXIV160604164
  • [2] [Anonymous], P 6 INT C LEARN REPR
  • [3] [Anonymous], 2019, ARXIV190107291
  • [4] [Anonymous], 2010, Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop, DOI DOI 10.1007/978-3-540-87479-9
  • [5] [Anonymous], 2018, INT C LEARNING REPRE
  • [6] [Anonymous], 2017, P 5 INT C LEARN REPR
  • [7] Artetxe M, 2018, AAAI CONF ARTIF INTE, P5012
  • [8] Artetxe Mikel, 2016, P EMNLP, P2289, DOI 10.18653/v1/D16-1250
  • [9] Bahdanau D., 2014, 3 INT C LEARN REPR
  • [10] A Teacher-Student Framework for Zero-Resource Neural Machine Translation
    Chen, Yun
    Liu, Yang
    Cheng, Yong
    Li, Victor O. K.
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1925 - 1935