Entity Resolution-Based Jaccard Similarity Coefficient for Heterogeneous Distributed Databases

被引:11
作者
Dharavath, Ramesh [1 ]
Singh, Abhishek Kumar [1 ]
机构
[1] Indian Sch Mines, Dept Comp Sci & Engn, Dhanbad 826004, Bihar, India
来源
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 1 | 2016年 / 379卷
关键词
Entity resolution (ER); Distributed database; Jaccard similarity coefficient; Markov logic;
D O I
10.1007/978-81-322-2517-1_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entity Resolution (ER) is a task for identifying same real world entity. It refers to data object matching or deduplication. It has been a leading research in the field of structure database. Due to its significance, entity resolution continues to be a most important challenge for heterogeneous distributed databases. Several methods have been proposed for the Entity resolution, but they have yielded unsatisfactory results. In this paper, we propose an efficient integrated solution to the entity resolution problem based on Jaccard similarity coefficient. Here we use Markov logic and Jaccard similarity coefficient for providing an efficient solution towards ER problem in heterogeneous distributed databases. The approach that we have implemented gives an overall success rate of about 98 %, thus proving better than the previously implemented algorithms.
引用
收藏
页码:497 / 507
页数:11
相关论文
共 26 条
[1]  
[Anonymous], IEEE 23 INT C DAT EN
[2]  
[Anonymous], P 22 INT C DAT ENG 2
[3]  
[Anonymous], BDA 2012
[4]  
[Anonymous], 2003, KDD Workshop on Data Cleaning and Object Consolidation
[5]  
[Anonymous], 20 AAAI
[6]  
[Anonymous], P 9 ACM SIGMOD WORKS
[7]  
[Anonymous], 2005, P 22 INT C MACHINE L, DOI DOI 10.1145/1102351.1102407
[8]  
[Anonymous], 2007, P IEEE INT C DAT ENG
[9]  
[Anonymous], 2006, P INT WORKSH INF QUA
[10]  
[Anonymous], 2003, P 9 ACM SIGKDD INT C, DOI DOI 10.1145/956750.956759