When GDD meets GNN: A knowledge-driven neural connection for effective entity resolution in property graphs

被引:0
作者
Hu, Junwei
Bewong, Michael [1 ,2 ,3 ]
Kwashie, Selasi [3 ]
Zhang, Yidi [1 ]
Nofong, Vincent [4 ]
Wondoh, John [1 ,2 ]
Feng, Zaiwen [1 ]
机构
[1] Huazhong Agr Univ, Coll Informat, Wuhan, Hubei, Peoples R China
[2] Charles Sturt Univ, Sch Comp Math & Engn, Wagga Wagga, NSW, Australia
[3] Charles Sturt Univ, AI & Cyber Futures Inst, Bathurst, NSW, Australia
[4] Univ Mines & Technol, Fac Engn, Tarkwa, Ghana
关键词
Entity resolution; Graph differential dependency; Graph neural network; Explainable entity linking; LINKING; RULES;
D O I
10.1016/j.is.2025.102551
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies the entity resolution (ER) problem in property graphs. ER is the task of identifying and linking different records that refer to the same real-world entity. It is commonly used in data integration, data cleansing, and other applications where it is important to have accurate and consistent data. In general, two predominant approaches exist in the literature: rule-based and learning-based methods. On the one hand, rule-based techniques are often desired due to their explainability and ability to encode domain knowledge. Learning-based methods, on the other hand, are preferred due to their effectiveness in spite of their black-box nature. In this work, we devise a hybrid ER solution, GraphER, that leverages the strengths of both systems for property graphs. In particular, we adopt graph differential dependency (GDD) for encoding the so-called record-matching rules, and employ them to guide a graph neural network (GNN) based representation learning for the task. We conduct extensive empirical evaluation of our proposal on benchmark ER datasets including 17 graph datasets and 7 relational datasets in comparison with 10 state-of-the-art (SOTA) techniques. The results show that our approach provides a significantly better solution to addressing ER in graph data, both quantitatively and qualitatively, while attaining highly competitive results on the benchmark relational datasets w.r.t. the SOTA solutions.
引用
收藏
页数:16
相关论文
共 90 条
[1]  
Ahmed Amr, 2013, P 22 INT C WORLD WID, P37
[2]  
Andoni A, 2015, ADV NEUR IN, V28
[3]  
[Anonymous], 2024, BeerAdvo-RateBeer and itunes-amazon datasets
[4]  
[Anonymous], 2024, Fodors-zagats dataset
[5]  
[Anonymous], 2024, Entity resolution, women's world cup 2019 and graph data science datasets
[6]   Large-Scale Deduplication with Constraints using Dedupalog [J].
Arasu, Arvind ;
Re, Christopher ;
Suciu, Dan .
ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, :952-963
[7]   Laplacian eigenmaps for dimensionality reduction and data representation [J].
Belkin, M ;
Niyogi, P .
NEURAL COMPUTATION, 2003, 15 (06) :1373-1396
[8]  
Bhattacharya I., 2007, ACM Tezas. Knowl Discon. Data, V1, P5, DOI DOI 10.1145/1217299.1217304
[9]  
Bhattacharya I, 2006, SIAM PROC S, P47
[10]   LACE: A Logical Approach to Collective Entity Resolution [J].
Bienvenu, Meghyn ;
Cima, Gianluca ;
Gutierrez-Basulto, Victor .
PROCEEDINGS OF THE 41ST ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS (PODS '22), 2022, :379-391