Analysis of schema structures in the Linked Open Data graph based on unique subject URIs, pay-level domains, and vocabulary usage

被引:6
作者
Gottron, Thomas [1 ]
Knauf, Malte [1 ]
Scherp, Ansgar [2 ,3 ]
机构
[1] Univ Koblenz Landau, Inst Web Sci & Technol WeST, D-56070 Koblenz, Germany
[2] Univ Kiel, Kiel, Germany
[3] Leibniz Informat Ctr Econ, Kiel, Germany
关键词
Linked Open Data; Schema analysis; Information; Entropy; WEB;
D O I
10.1007/s10619-014-7143-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Linked Open Data (LOD) graph represents a web-scale distributed knowledge graph interlinking information about entities across various domains. A core concept is the lack of pre-defined schema which actually allows for flexibly modelling data from all kinds of domains. However, Linked Data does exhibit schema information in a twofold way: by explicitly attaching RDF types to the entities and implicitly by using domain-specific properties to describe the entities. In this paper, we present and apply different techniques for investigating the schematic information encoded in the LOD graph at different levels of granularity. We investigate different information theoretic properties of so-called Unique Subject URIs (USUs) and measure the correlation between the properties and types that can be observed for USUs on a large-scale semantic graph data set. Our analysis provides insights into the information encoded in the different schema characteristics. Two major findings are that implicit schema information is far more discriminative and that applications involving schema information based on either types or properties alone will only capture between 63.5 and 88.1 % of the schema information contained in the data. As the level of discrimination depends on how data providers model and publish their data, we have conducted in a second step an investigation based on pay-level domains (PLDs) as well as the semantic level of vocabularies. Overall, we observe that most data providers combine up to 10 vocabularies to model their data and that every fifth PLD uses a highly structured schema.
引用
收藏
页码:515 / 553
页数:39
相关论文
共 30 条
  • [1] Agrawal Rakesh., 1994, P 20 INT C VER LARG, P487
  • [2] Alexander K., 2013, DESCRIBING LINKED DA
  • [3] [Anonymous], ELEMENTS OF INFORMAT
  • [4] [Anonymous], 2008, P 17 INT C WORLD WID, DOI [10.1145/1367497.1367676, DOI 10.1145/1367497.1367676]
  • [5] [Anonymous], APPLIED STATISTICS F
  • [6] Auer Soren, 2012, Knowledge Engineering and Knowledge Management. 18th International Conference, EKAW 2012. Proceedings, P353, DOI 10.1007/978-3-642-33876-2_31
  • [7] The Emerging Web of Linked Data
    Bizer, Christian
    [J]. IEEE INTELLIGENT SYSTEMS, 2009, 24 (05) : 87 - 92
  • [8] Cheng G, 2008, LECT NOTES COMPUT SC, V5318, P665, DOI 10.1007/978-3-540-88564-1_42
  • [9] Ding L., 2004, CIKM ACM
  • [10] Ding L, 2006, LECT NOTES COMPUT SC, V4273, P242