Detecting Identical Entities in the Semantic Web Data

被引:0
|
作者
Holub, Michal [1 ]
Proksa, Ondrej [1 ]
Bielikova, Maria [1 ]
机构
[1] Slovak Univ Technol Bratislava, Inst Informat & Software Engn, Fac Informat & Informat Technol, Bratislava 84216, Slovakia
来源
SOFSEM 2015: THEORY AND PRACTICE OF COMPUTER SCIENCE | 2015年 / 8939卷
关键词
duplicates; identity; similarity; relationship; semantic web; owl:sameAs; Linked Data; web of data;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large amount of entities published by various sources inevitably introduces inaccuracies, mainly duplicated information. These can even be found within a single dataset. In this paper we propose a method for automatic discovery of identity relationship between two entities (also known as instance matching) in a dataset represented as a graph (e.g. in the Linked Data Cloud). Our method can be used for cleaning existing datasets from duplicates, validating of existing identity relationships between entities within a dataset, or for connecting different datasets using the owl:sameAs relationship. Our method is based on the analysis of sub-graphs formed by entities, their properties and existing relationships between them. It can learn a common similarity threshold for particular dataset, so it is adaptable to its different properties. We evaluated our method by conducting several experiments on data from the domains of public administration and digital libraries.
引用
收藏
页码:519 / 530
页数:12
相关论文
共 50 条
  • [21] Mining information from sentences through Semantic Web data and Information Extraction tasks
    Martinez-Rodriguez, Jose L.
    Lopez-Arevalo, Ivan
    Rios-Alvarado, Ana B.
    JOURNAL OF INFORMATION SCIENCE, 2022, 48 (01) : 3 - 20
  • [22] Semantic Distance Spreading Across Entities in Linked Open Data
    Alfarhood, Sultan
    Gauch, Susan
    Labille, Kevin
    INFORMATION, 2019, 10 (01):
  • [23] CREATING AND EXPLOITING A WEB OF SEMANTIC DATA
    Finin, Tim
    Syed, Zareen
    ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1: ARTIFICIAL INTELLIGENCE, 2010, : IS7 - IS18
  • [24] Representing terminological data in the Semantic Web
    Martin-Chozas, Patricia
    Declerck, Thierry
    Montiel-Ponsoda, Elena
    Rodriguez-Doncel, Victor
    TERMINOLOGY, 2024,
  • [25] Extending Semantic Provenance into the Web of Data
    Zhao, Jun
    Sahoo, Satya S.
    Missier, Paolo
    Sheth, Amit
    Goble, Carole
    IEEE INTERNET COMPUTING, 2011, 15 (01) : 40 - 48
  • [26] Data of Semantic Web as Unit of Knowledge
    Patel, Archana
    Jain, Sarika
    Shandilya, Shishir K.
    JOURNAL OF WEB ENGINEERING, 2018, 17 (08): : 647 - 674
  • [27] CREATING AND EXPLOITING A WEB OF SEMANTIC DATA
    Finin, Tim
    Syed, Zareen
    ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2: AGENTS, 2010, : IS7 - IS18
  • [28] A Semantic Web and Linked Data based Framework for Smart City Data Management
    Gupta, Shaswat
    Padhy, Abhijeet
    Adhikari, Abhijit
    Dutta, Animesh
    2016 13TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY (ECTI-CON), 2016,
  • [29] Semantic Web in data mining and knowledge discovery: A comprehensive survey
    Ristoski, Petar
    Paulheim, Heiko
    JOURNAL OF WEB SEMANTICS, 2016, 36 : 1 - 22
  • [30] Using SPARQL and SPIN for Data Quality Management on the Semantic Web
    Fuerber, Christian
    Hepp, Martin
    BUSINESS INFORMATION SYSTEMS, PROCEEDINGS, 2010, 47 : 35 - 46