Query-Driven Approach to Entity Resolution

被引:31
作者
Altwaijry, Hotham [1 ]
Kalashnikov, Dmitri V. [1 ]
Mehrotra, Sharad [1 ]
机构
[1] Univ Calif Irvine, Dept Comp Sci, Irvine, CA 92697 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2013年 / 6卷 / 14期
基金
美国国家科学基金会;
关键词
D O I
10.14778/2556549.2556567
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper explores "on-the-fly" data cleaning in the context of a user query. A novel Query-Driven Approach (QDA) is developed that performs a minimal number of cleaning steps that are only necessary to answer a given selection query correctly. The comprehensive empirical evaluation of the proposed approach demonstrates its significant advantage in terms of efficiency over traditional techniques for query driven applications.
引用
收藏
页码:1846 / 1857
页数:12
相关论文
共 26 条
[1]  
Ananthakrishna R., 2002, Proceedings of the Twenty-eighth International Conference on Very Large Data Bases, P586
[2]  
Benjelloun O., 2008, VLDB J, V18, P255
[3]   Query-time entity resolution [J].
Bhattacharya, Indrajit ;
Getoor, Lise .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2007, 30 (621-657) :621-657
[4]   Adaptive product normalization: Using online learning for record linkage in comparison shopping [J].
Bilenko, M ;
Basu, S ;
Sahami, M .
FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, :58-65
[5]  
Chen ZQ, 2009, ACM SIGMOD/PODS 2009 CONFERENCE, P207
[6]   Adaptive Graphical Approach to Entity Resolution [J].
Chen, Zhaoqi ;
Kalashnikov, Dmitri V. ;
Mehrotra, Sharad .
PROCEEDINGS OF THE 7TH ACM/IEE JOINT CONFERENCE ON DIGITAL LIBRARIES: BUILDING & SUSTAINING THE DIGITAL ENVIRONMENT, 2007, :204-213
[7]  
Cohen W.W., 2003, IIWEB, P73
[8]  
Dong X., 2005, SIGMOD, P85, DOI DOI 10.1145/1066157.1066168
[9]  
Elmacioglu E., 2007, WIDM, P121
[10]   Duplicate record detection: A survey [J].
Elmagarmid, Ahmed K. ;
Ipeirotis, Panagiotis G. ;
Verykios, Vassilios S. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (01) :1-16