Duplicate Detection Exploiting Data Relationships

被引:0
作者
Herschel, Melanie [1 ]
机构
[1] Univ Tubingen, Wilhelm Schickard Inst Informat, Lehrstuhl Datenbanksyst, Sand 13, D-72076 Tubingen, Germany
来源
IT-INFORMATION TECHNOLOGY | 2009年 / 51卷 / 04期
关键词
H.2 [Information Systems: Database Management; H.2.5 [Information Systems: Database Management: Heterogeneous Databases; dublication detection; algorithms; performance; data quality; data integration;
D O I
10.1524/itit.2009.0546
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Duplicate detection consists in identifying multiple, different data base representations of a same real-world object. State-of-the-art duplicate detection systems usually concentrate on identifying duplicates in a single relational table and thereby ignore that the data may exist in a larger context that, when considered, can significantly improve the performance of duplicate detection. In this paper, we present algorithms that exploit relationships that exist in the data.
引用
收藏
页码:231 / 234
页数:4
相关论文
共 50 条
  • [21] TA-DRD: A three-step automatic duplicate record detection
    Dong, Yongquan, 1600, Bentham Science Publishers B.V., P.O. Box 294, Bussum, 1400 AG, Netherlands (06): : 1277 - 1286
  • [22] Cyber-Attack Detection in Socio-Technical Transportation Systems Exploiting Redundancies Between Physical and Social Data
    Roy, Tanushree
    Sattarzadeh, Sara
    Dey, Satadru
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (03): : 1477 - 1488
  • [23] Two-Channel Passive Detection Exploiting Cyclostationarity
    Horstmann, Stefanie
    Ramirez, David
    Schreier, Peter J.
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [24] Exploiting Surroundedness for Saliency Detection: A Boolean Map Approach
    Zhang, Jianming
    Sclaroff, Stan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (05) : 889 - 902
  • [25] Exploiting k-constraints to reduce memory overhead in continuous queries over data streams
    Babu, S
    Srivastava, U
    Widom, J
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2004, 29 (03): : 545 - 580
  • [26] Exploiting Data Source Distribution to Enhance NVM Reliability
    Berman, Amit
    PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, MEMSYS 2022, 2022,
  • [27] Exploiting Context and Quality for Linked Data Source Selection
    Catania, Barbara
    Guerrini, Giovanna
    Yaman, Beyza
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 2251 - 2258
  • [28] Web-based Arabic/English Duplicate Record Detection with Nested Blocking Technique
    Higazy, Azza
    El Tobely, Tarek
    Yousef, Ahmed H.
    Sarhan, Amany
    2013 8TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2013, : 313 - 318
  • [29] CONFLICT DETECTION TRADEOFFS FOR REPLICATED DATA
    CAREY, MJ
    LIVNY, M
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 1991, 16 (04): : 703 - 746
  • [30] Exploiting Data-Level Parallelism For Energy-Efficient Implementation of LDPC Decoders and DCT on an FPGA
    Chen, Xiaoheng
    Akella, Venkatesh
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2011, 4 (04)