Duplicate Detection Exploiting Data Relationships

被引:0
|
作者
Herschel, Melanie [1 ]
机构
[1] Univ Tubingen, Wilhelm Schickard Inst Informat, Lehrstuhl Datenbanksyst, Sand 13, D-72076 Tubingen, Germany
来源
IT-INFORMATION TECHNOLOGY | 2009年 / 51卷 / 04期
关键词
H.2 [Information Systems: Database Management; H.2.5 [Information Systems: Database Management: Heterogeneous Databases; dublication detection; algorithms; performance; data quality; data integration;
D O I
10.1524/itit.2009.0546
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Duplicate detection consists in identifying multiple, different data base representations of a same real-world object. State-of-the-art duplicate detection systems usually concentrate on identifying duplicates in a single relational table and thereby ignore that the data may exist in a larger context that, when considered, can significantly improve the performance of duplicate detection. In this paper, we present algorithms that exploit relationships that exist in the data.
引用
收藏
页码:231 / 234
页数:4
相关论文
共 50 条
  • [1] Duplicate Data Detection Using GNN
    Lu, Hanrong
    Chen, Xin
    Lan, Xuhui
    Zheng, Feng
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016), 2016, : 167 - 170
  • [2] Exploiting multiplex data relationships in Support Vector Machines
    Mygdalis, Vasileios
    Tefas, Anastasios
    Pitas, Ioannis
    PATTERN RECOGNITION, 2019, 85 : 70 - 77
  • [3] BigDedup: A Big Data Integration Toolkit for Duplicate Detection in Industrial Scenarios
    Gagliardelli, Luca
    Zhu, Song
    Simonini, Giovanni
    Bergamaschi, Sonia
    TRANSDISCIPLINARY ENGINEERING METHODS FOR SOCIAL INNOVATION OF INDUSTRY 4.0, 2018, 7 : 1015 - 1023
  • [4] Efficient Similarity Joins for Near-Duplicate Detection
    Xiao, Chuan
    Wang, Wei
    Lin, Xuemin
    Yu, Jeffrey Xu
    Wang, Guoren
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2011, 36 (03):
  • [5] Duplicate record detection: A survey
    Elmagarmid, Ahmed K.
    Ipeirotis, Panagiotis G.
    Verykios, Vassilios S.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (01) : 1 - 16
  • [6] Exploiting the Anomaly Detection for High Dimensional Data using Descriptive Approach of Data Mining
    Singh, Bharat
    Kushwaha, Nidhi
    Vyas, O. P.
    2013 4TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER & COMMUNICATION TECHNOLOGY (ICCCT), 2013, : 121 - 128
  • [7] DWCLEANSER: A Framework for Approximate Duplicate Detection
    Thakur, Garima
    Singh, Manu
    Pahwa, Payal
    Tyagi, Nidhi
    ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY, 2011, 198 : 355 - +
  • [8] Duplicate detection algorithms of bibliographic descriptions
    Sitas, Anestis
    Kapidakis, Sarantos
    LIBRARY HI TECH, 2008, 26 (02) : 287 - 301
  • [9] Scalable Iterative Graph Duplicate Detection
    Herschel, Melanie
    Naumann, Felix
    Szott, Sascha
    Taubert, Maik
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (11) : 2094 - 2108
  • [10] An Automatic Blocking Strategy for XML Duplicate Detection
    Leitao, Luis
    Calado, Pavel
    APPLIED COMPUTING REVIEW, 2013, 13 (02): : 42 - 53