Linking Datasets Using Semantic Textual Similarity

被引:12
作者
McCrae, John P. [1 ]
Buitelaar, Paul [1 ]
机构
[1] Natl Univ Ireland Galway, Insight Ctr Data Analyt, Galway H91 A06C, Ireland
基金
欧盟地平线“2020”;
关键词
Linked data; link discovery; ontology alignment; semantic textual similarity; structural similarity; NLP architectures;
D O I
10.2478/cait-2018-0010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Linked data has been widely recognized as an important paradigm for representing data and one of the most important aspects of supporting its use is discovery of links between datasets. For many datasets, there is a significant amount of textual information in the form of labels, descriptions and documentation about the elements of the dataset and the fundament of a precise linking is in the application of semantic textual similarity to link these datasets. However, most linking tools so far rely on only simple string similarity metrics such as Jaccard scores. We present an evaluation of some metrics that have performed well in recent semantic textual similarity evaluations and apply these to linking existing datasets.
引用
收藏
页码:109 / 123
页数:15
相关论文
共 28 条
[1]  
Agirre Eneko, 2016, P 10 INT WORKSH SEM, P497, DOI DOI 10.18653/V1/S16-1081
[2]  
Cer D., 2017, ARXIV170800055
[3]  
Euzenat Jerome, 2011, Journal on Data Semantics XV: LNCS 6720, P158, DOI 10.1007/978-3-642-22630-4_6
[4]  
Fellbaum C, 2010, THEORY AND APPLICATIONS OF ONTOLOGY: COMPUTER APPLICATIONS, P231, DOI 10.1007/978-90-481-8847-5_10
[5]  
Fernando S, 2012, LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P590
[6]  
Finkel J. R., 2005, P 43 ANN M ASS COMP, P363, DOI DOI 10.3115/1219840.1219885
[7]  
Frank E, 2010, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, SECOND EDITION, P1269, DOI 10.1007/978-0-387-09823-4_66
[8]  
Gal Y, 2016, PR MACH LEARN RES, V48
[9]  
Ganitkevitch J., 2013, P NAACL HLT, P758
[10]   Comparing Measures of Sparsity [J].
Hurley, Niall ;
Rickard, Scott .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2009, 55 (10) :4723-4741