Ontology-based Entity Matching in Attributed Graphs

被引:24
作者
Ma, Hanchao [1 ]
Alipourlangouri, Morteza [1 ,2 ]
Wu, Yinghui [3 ]
Chiang, Fei [2 ]
Pi, Jiaxing [4 ]
机构
[1] Washington State Univ, Pullman, WA 99164 USA
[2] McMaster Univ, Hamilton, ON, Canada
[3] Pacific Northwest Natl Lab, Richland, WA 99352 USA
[4] Siemens Corp Technol, Munich, Germany
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2019年 / 12卷 / 10期
关键词
KEYS;
D O I
10.14778/3339490.3339501
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Keys for graphs incorporate the topology and value constraints needed to uniquely identify entities in a graph. They have been studied to support object identification, knowledge fusion, and social network reconciliation. Existing key constraints identify entities as the matches of a graph pattern by subgraph isomorphism, which enforce label equality on node types. These constraints can be too restrictive to characterize structures and node labels that are syntactically different but semantically equivalent. We propose a new class of key constraints, Ontological Graph Keys (OGKs) that extend conventional graph keys by ontological subgraph matching between entity labels and an external ontology. We show that the implication and validation problems for OGKs are each NP-complete. To reduce the entity matching cost, we also provide an algorithm to compute a minimal cover for OGKs. We then study the entity matching problem with OGKs, and a practical variant with a budget on the matching cost. We develop efficient algorithms to perform entity matching based on a (budgeted) Chase procedure. Using real-world graphs, we experimentally verify the efficiency and accuracy of OGK-based entity matching.
引用
收藏
页码:1195 / 1207
页数:13
相关论文
共 37 条
[1]  
Abiteboul S., 1995, Foundations of Databases, V8
[2]  
Akhtar W., 2011, LNCS, V6834, P23, DOI DOI 10.1007/978-3-642-23441-5
[3]  
Alipourlangouri M., 2018, VLDB WORKSH TD LSG
[4]   Survey of graph database models [J].
Angles, Renzo ;
Gutierrez, Claudio .
ACM COMPUTING SURVEYS, 2008, 40 (01)
[5]  
[Anonymous], 2001, 3 IAPR TC15 WORKSH G
[6]   A normal form for XML documents [J].
Arenas, M ;
Libkin, L .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2004, 29 (01) :195-232
[7]   Defining Key Semantics for the RDF Datasets: Experiments and Evaluations [J].
Atencia, Manuel ;
Chein, Michel ;
Croitoru, Madalina ;
David, Jerome ;
Leclere, Michel ;
Pernelle, Nathalie ;
Sais, Fatiha ;
Scharffe, Francois ;
Symeonidou, Danai .
GRAPH-BASED REPRESENTATION AND REASONING, 2014, 8577 :65-78
[8]   Efficient Discovery of Ontology Functional Dependencies [J].
Baskaran, Sridevi ;
Keller, Alexander ;
Chiang, Fei ;
Golab, Lukasz ;
Szlichta, Jaroslaw .
CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, :1847-1856
[9]   The Traveling Salesman Problem in Bounded Degree Graphs [J].
Bjorklund, Andreas ;
Husfeldt, Thore ;
Kaski, Petteri ;
Koivisto, Mikko .
ACM TRANSACTIONS ON ALGORITHMS, 2012, 8 (02)
[10]   Keys for XML [J].
Buneman, P ;
Davidson, S ;
Fan, WF ;
Hara, C ;
Tan, WC .
COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 2002, 39 (05) :473-487