Efficient Entity Resolution Based on Sequence Rules

被引:0
作者
Li, Yakun [1 ]
Wang, Hongzhi [1 ]
Gao, Hong [1 ]
机构
[1] Harbin Inst Technol, Harbin, Peoples R China
来源
ADVANCED RESEARCH ON COMPUTER SCIENCE AND INFORMATION ENGINEERING, PT I | 2011年 / 152卷
关键词
Entity resolution; Record matching; Bloom Filter;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Entity resolution (ER) is to find the data objects referring to the same real-world entity. When ER is performed on relations, the crucial operator is record matching, which is to judge whether two tuples referring to the same real-world entity. Record matching is a longstanding issue. However, with massive and complex data in applications, current methods cannot satisfy the requirements. A Sequence-rule-based record matching (SeReMatching) is presented with the consideration of both the values of the attributes and their importance in record matching. And with the help of the Bloom Filter we changed, the algorithm greatly increases the checking speed and makes the complexity of entity resolution almost O(n). And extensive experiments are performed to evaluate our methods.
引用
收藏
页码:381 / 388
页数:8
相关论文
共 16 条
  • [1] Aebi D., 1993, CISMOD, P273
  • [2] [Anonymous], 2003, Exploratory Data Mining and Data Cleaning
  • [3] Dynamic constraints for record matching
    Fan, Wenfei
    Gao, Hong
    Jia, Xibei
    Li, Jianzhong
    Ma, Shuai
    [J]. VLDB JOURNAL, 2011, 20 (04) : 495 - 520
  • [4] Guo Zhi-mao, 2002, Journal of Software, V13, P2076
  • [5] HASSANZADEH O, 2007, THESIS U TORONTO
  • [6] Hassanzadeh O., 2007, Proc. of the International Workshop on Quality in Databases (QDB), P11
  • [7] Creating probabilistic databases from duplicated data
    Hassanzadeh, Oktie
    Miller, Renee J.
    [J]. VLDB JOURNAL, 2009, 18 (05) : 1141 - 1166
  • [8] Hellerstein Joseph M., 2008, United Nations Economic Commission for Europe (UNECE), V25, P1
  • [9] Real-world data is dirty: Data cleansing and the merge/purge problem
    Hernandez, MA
    Stolfo, SJ
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (01) : 9 - 37
  • [10] Low WL, 2001, INFORM SYST, V26, P585, DOI 10.1016/S0306-4379(01)00041-2