Unsupervised Cross-Lingual Mapping for Phrase Embedding Spaces

被引:0
作者
Ayana, Abraham G. [1 ]
Cao, Hailong [1 ]
Zhao, Tiejun [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
来源
ADVANCES IN INFORMATION AND COMMUNICATION, VOL 2 | 2020年 / 1130卷
关键词
Cross-lingual mapping; Word embedding; Phrase embedding; Machine translation; Mutual Information; Linear transformation;
D O I
10.1007/978-3-030-39442-4_38
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual embedding has shown an effective way to learn cross-lingual representation in a joint embedding space. Recent work showed that cross-lingual phrase embedding is important to induce phrase table for unsupervised phrase-based machine translation. However, most of the cross-lingual representation from the literature are either limited to word level embedding or uses bilingual supervision for shared phrase embedding space. Therefore, in this paper, we explore the ways to map phrase embeddings of two languages into a common embedding space without supervision. Our model uses a three-step process: first we identify phrase in a sentence by using their mutual information, and combine component words of the phrase in the preprocessing stage; then we independently learn phrase embedding for each language based on their distributional properties, finally a fully unsupervised linear transformation method based on self-learning is used to map the phrase embeddings into a shared space. We extracted bilingual phrase translation as a gold standard to evaluate the result of the system. Besides its simplicity, the proposed method has shown a promising result for phrase embedding mapping.
引用
收藏
页码:512 / 524
页数:13
相关论文
共 19 条
[1]  
[Anonymous], 2017, ABS170604902 CORR
[2]  
[Anonymous], 2016, P 1 WORKSH REPR LEAR
[3]  
Artetxe M., 2018, 32 AAAI C ART INT AA
[4]  
Artetxe M, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3632
[5]  
Artetxe M, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P789
[6]   Learning bilingual word embeddings with (almost) no bilingual data [J].
Artetxe, Mikel ;
Labaka, Gorka ;
Agirre, Eneko .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :451-462
[7]  
Bouma G., 2009, Proceedings of GSCL, V30, P31, DOI DOI 10.1007/BF02774984
[8]  
Chao Xing, 2015, P 2015 C N AM CHAPTE, P1006
[9]  
Conneau A., 2018, INT C LEARN REPR, DOI [10.1111/j.1540-4560.2007.00543.x, DOI 10.1111/J.1540-4560.2007.00543.X]
[10]  
Faruqui M., 2014, Proc. 14th Conf. Eur. Chapter Assoc. Comput. Linguistics, Gothenburg, P462, DOI DOI 10.3115/V1/E14-1049