Measuring Structural Similarity Between RDF Graphs

被引:12
作者
Maillot, Pierre [1 ]
Bobed, Carlos [1 ]
机构
[1] Univ Rennes, CNRS, IRISA, F-35000 Rennes, France
来源
33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING | 2018年
关键词
Similarity; Semantic Web; Linked Data; Data Mining; SEMANTIC WEB;
D O I
10.1145/3167132.3167342
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the latest years, there has been a huge effort to deploy large amounts of data, making it available in the form of RDF data thanks, among others, to the Linked Data initiative. In this context, using shared ontologies has been crucial to gain interoperability, and to be able to integrate and exploit third party datasets. However, using the same ontology does not suffice to successfully query or integrate external data within your own dataset: the actual usage of the vocabulary (e.g., which concepts have instances, which properties are actually populated and how, etc.) is crucial for these tasks. Being able to compare different RDF graphs at the actual usage level would indeed help in such situations. Unfortunately, the complexity of graph comparison is an obstacle to the scalability of many approaches. In this article, we present our structural similarity measure, designed to compare structural similarity of low-level data between two different RDF graphs according to the patterns they share. To obtain such patterns, we leverage a data mining method (KRIMP) which allows to extract the most descriptive patterns appearing in a transactional database. We adapt this method to the particularities of RDF data, proposing two different conversions for an RDF graph. Once we have the descriptive patterns, we evaluate how much two graphs can compress each other to give a numerical measure depending on the common data structures they share. We have carried out several experiments to show its ability to capture the structural differences of actual vocabulary usage.
引用
收藏
页码:1960 / 1967
页数:8
相关论文
共 19 条
[1]  
[Anonymous], 2017, W3C recommendation
[2]   Graph-FCA in Practice [J].
Ferre, Sebastien ;
Cellier, Peggy .
GRAPH-BASED REPRESENTATION AND REASONING (ICCS 2016), 2016, 9717 :107-121
[3]   A survey of graph edit distance [J].
Gao, Xinbo ;
Xiao, Bing ;
Tao, Dacheng ;
Li, Xuelong .
PATTERN ANALYSIS AND APPLICATIONS, 2010, 13 (01) :113-129
[4]  
Glimm B, 2014, LECT NOTES COMPUT SC, V8797, P180, DOI 10.1007/978-3-319-11915-1_12
[5]   Fast algorithms for frequent itemset mining using FP-trees [J].
Grahne, G ;
Zhu, JF .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (10) :1347-1362
[6]   Consistency Evaluation of RDF Data: How Data and Updates Are Relevant [J].
Maillot, Pierre ;
Raimbault, Thomas ;
Genest, David ;
Loiseau, Stephane .
10TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY AND INTERNET-BASED SYSTEMS SITIS 2014, 2014, :187-193
[7]  
Meusel R, 2014, LECT NOTES COMPUT SC, V8796, P277, DOI 10.1007/978-3-319-11964-9_18
[8]   A semantic similarity measure for linked data: An information content-based approach [J].
Meymandpour, Rouzbeh ;
Davis, Joseph G. .
KNOWLEDGE-BASED SYSTEMS, 2016, 109 :276-293
[9]   Finding association rules in semantic web data [J].
Nebot, Victoria ;
Berlanga, Rafael .
KNOWLEDGE-BASED SYSTEMS, 2012, 25 (01) :51-62
[10]  
Piao G., 2016, P 31 ANN ACM S APPL, P315, DOI [10.1145/2851613.2851839, DOI 10.1145/2851613.2851839]