Evaluating cross-lingual textual similarity on dictionary alignment problem

被引:0
作者
Yiğit Sever
Gönenç Ercan
机构
[1] Middle East Technical University,Department of Computer Engineering
[2] Hacettepe University,Institute of Informatics
来源
Language Resources and Evaluation | 2020年 / 54卷
关键词
Cross-lingual textual semantic similarity; Word embeddings; Wasserstein distance; Sinkhorn distance; Siamese neural network;
D O I
暂无
中图分类号
学科分类号
摘要
Bilingual or even polylingual word embeddings created many possibilities for tasks involving multiple languages. While some tasks like cross-lingual information retrieval aim to satisfy users’ multilingual information needs, some enable transferring valuable information from resource-rich languages to resource-poor ones. In any case, it is important to build and evaluate methods that operate in a cross-lingual setting. In this paper, Wordnet definitions in 7 different languages are used to create a semantic textual similarity testbed to evaluate cross-lingual textual semantic similarity methods. A document alignment task is created to be used between Wordnet glosses of synsets in 7 different languages. Unsupervised textual similarity methods—Wasserstein distance, Sinkhorn distance and cosine similarity—are compared with a supervised Siamese deep learning model. The task is modeled both as a retrieval task and an alignment task to investigate the hubness of the semantic similarity functions. Our findings indicate that considering the problem as a retrieval and alignment problem has a detrimental effect on the results. Furthermore, we show that cross-lingual textual semantic similarity can be used as an automated Wordnet construction method.
引用
收藏
页码:1059 / 1078
页数:19
相关论文
共 49 条
  • [1] Bengio Y(2003)A neural probabilistic language model Journal of Machine Learning Research 3 1137-1155
  • [2] Ducharme R(2012)A Survey of WordNets and their Licenses GWC 2012 6th International Global Wordnet Conference 8 64-150
  • [3] Vincent P(2019)Synset expansion on translation graph for automatic wordnet construction Information Processing & Management 56 130-570
  • [4] Jauvin C(2016)A systematic study of knowledge graph analysis for cross-language plagiarism detection Information Processing & Management 52 550-105
  • [5] Bond F(2004)The software infrastructure for the development and validation of the Greek WordNet Romanian Journal of Information Science and Technology 7 89-120
  • [6] Paik K(2019)Learning multilingual word embeddings in latent metric space: A geometric approach Transactions of the Association for Computational Linguistics 7 107-1474
  • [7] Ercan G(2012)Inducing crosslingual distributed representations of words Proceedings of COLING 2012 1459-299
  • [8] Haziyev F(2009)DanNet: The challenge of compiling a wordnet for Danish by reusing a monolingual dictionary Language Resources and Evaluation 43 269-62
  • [9] Franco-Salvador M(2011)Cross-language plagiarism detection Language Resources and Evaluation 45 45-316
  • [10] Rosso P(2016)News across languages-cross-lingual document similarity and event tracking Journal of Artificial Intelligence Research 55 283-348