Data-driven assessment of structural evolution of RDF graphs

被引:3
作者
Bobed, Carlos [1 ,2 ]
Maillot, Pierre [3 ]
Cellier, Peggy [4 ]
Ferri, Sibastien [3 ]
机构
[1] Everis NTT Data, Barcelona, Spain
[2] Univ Zaragoza, Zaragoza, Spain
[3] Univ Rennes, CNRS, IRISA, Rennes, France
[4] Univ Rennes, INSA, CNRS, IRISA, Rennes, France
关键词
Data evolution; data management; pattern mining; similarity measure; Semantic Web; EFFICIENT ALGORITHM; SEMANTIC WEB; LINKED DATA; QUALITY; DBPEDIA;
D O I
10.3233/SW-200368
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since the birth of the Semantic Web, numerous knowledge bases have appeared. The applications that exploit them rely on the quality of their data through time. In this regard, one of the main dimensions of data quality is conformance to the expected usage of the vocabulary. However, the vocabulary usage (i.e., how classes and properties are actually populated) can vary from one base to another. Moreover, through time, such usage can evolve within a base and diverge from the previous practices. Methods have been proposed to follow the evolution of a knowledge base by the observation of the changes of their intentional schema (or ontology); however, they do not capture the evolution of their actual data, which can vary greatly in practice. In this paper, we propose a data-driven approach to assess the global evolution of vocabulary usage in large RDF graphs. Our proposal relies on two structural measures defined at different granularities (dataset vs update), which are based on pattern mining techniques. We have performed a thorough experimentation which shows that our approach is scalable, and can capture structural evolution through time of both synthetic (LUBM) and real knowledge bases (different snapshots and updates of DBpedia).
引用
收藏
页码:831 / 853
页数:23
相关论文
共 57 条
[21]   DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia [J].
Lehmann, Jens ;
Isele, Robert ;
Jakob, Max ;
Jentzsch, Anja ;
Kontokostas, Dimitris ;
Mendes, Pablo N. ;
Hellmann, Sebastian ;
Morsey, Mohamed ;
van Kleef, Patrick ;
Auer, Soeren ;
Bizer, Christian .
SEMANTIC WEB, 2015, 6 (02) :167-195
[22]   Measuring Structural Similarity Between RDF Graphs [J].
Maillot, Pierre ;
Bobed, Carlos .
33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2018, :1960-1967
[23]   FORMULIS: Dynamic Form-Based Interface for Guided Knowledge Graph Authoring [J].
Maillot, Pierre ;
Ferre, Sebastien ;
Cellier, Peggy ;
Ducasse, Mireille ;
Partouche, Franck .
KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT, 2017, 10180 :140-144
[24]   Consistency Evaluation of RDF Data: How Data and Updates Are Relevant [J].
Maillot, Pierre ;
Raimbault, Thomas ;
Genest, David ;
Loiseau, Stephane .
10TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY AND INTERNET-BASED SYSTEMS SITIS 2014, 2014, :187-193
[25]   Extended Characteristic Sets: Graph Indexing for SPARQL Query Optimization [J].
Meimaris, Marios ;
Papastefanatos, George ;
Mamoulis, Nikos ;
Anagnostopoulos, Ioannis .
2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, :497-508
[26]   Synthesizing Knowledge Graphs for Link and Type Prediction Benchmarking [J].
Melo, Andre ;
Paulheim, Heiko .
SEMANTIC WEB ( ESWC 2017), PT I, 2017, 10249 :136-151
[27]   A semantic similarity measure for linked data: An information content-based approach [J].
Meymandpour, Rouzbeh ;
Davis, Joseph G. .
KNOWLEDGE-BASED SYSTEMS, 2016, 109 :276-293
[28]  
Mihindukulasooriya N., 2015, CEUR WORKSHOP P, V1486
[29]   Deriving an Emergent Relational Schema from RDF Data [J].
Minh-Duc Pham ;
Passing, Linnea ;
Erling, Orri ;
Boncz, Peter .
PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW 2015), 2015, :864-874
[30]   Exploiting Emergent Schemas to Make RDF Systems More Efficient [J].
Minh-Duc Pham ;
Boncz, Peter .
SEMANTIC WEB - ISWC 2016, PT I, 2016, 9981 :463-479