From unified phrase representation to bilingual phrase alignment in an unsupervised manner

被引:0
作者
Liu, Jingshu [1 ,2 ]
Morin, Emmanuel [1 ,2 ]
Pena Saldarriaga, Sebastian [2 ]
Lark, Joseph [2 ]
机构
[1] Univ Nantes, LS2N, UMR CNRS 6004, Nantes, France
[2] Dictanova, 6 Rue Rene Viviani, F-44200 Nantes, France
关键词
Multilinguality; Phrase representation; Bilingual alignment; Word semantics;
D O I
10.1017/S1351324922000328
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Significant advances have been achieved in bilingual word-level alignment, yet the challenge remains for phrase-level alignment. Moreover, the need for parallel data is a critical drawback for the alignment task. This work proposes a system that alleviates these two problems: a unified phrase representation model using cross-lingual word embeddings as input and an unsupervised training algorithm inspired by recent works on neural machine translation. The system consists of a sequence-to-sequence architecture where a short sequence encoder constructs cross-lingual representations of phrases of any length, then an LSTM network decodes them w.r.t their contexts. After training with comparable corpora and existing key phrase extraction, our encoder provides cross-lingual phrase representations that can be compared without further transformation. Experiments on five data sets show that our method obtains state-of-the-art results on the bilingual phrase alignment task and improves the results of different length phrase alignment by a mean of 8.8 points in MAP.
引用
收藏
页码:643 / 668
页数:26
相关论文
共 82 条
[71]  
Vincent Pascal., 2008, Proceedings of the 25th International Conference on Machine learning - ICML '08, P1096
[72]  
Wang A., 2018, GLUE MULTITASK BENCH, DOI 10.18653/v1/
[73]   Learning Deep Structure-Preserving Image-Text Embeddings [J].
Wang, Liwei ;
Li, Yin ;
Lazebnik, Svetlana .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5005-5013
[74]  
Williams Adina., 2018, P 2018 C N AM CHAPTE, V1, P1112, DOI [10.18653/v1/N18-1101, DOI 10.18653/V1/N18-1101]
[75]  
Wu JW, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P1173
[76]  
Xing C., 2015, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, P1006, DOI [10.3115/v1/N15-1104, DOI 10.3115/V1/N15-1104]
[77]  
Yang Z, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P46
[78]  
Zagoruyko Sergey, 2015, IEEE C COMPUTER VISI, P4353, DOI DOI 10.1109/CVPR.2015.7299064
[79]  
Zellers R, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P93
[80]  
Zhang J., 2016, P 2016 C EMP METH NA, P1535, DOI DOI 10.18653/V1/D16-1160