Duplicate Detection Exploiting Data Relationships

被引:0
作者
Herschel, Melanie [1 ]
机构
[1] Univ Tubingen, Wilhelm Schickard Inst Informat, Lehrstuhl Datenbanksyst, Sand 13, D-72076 Tubingen, Germany
来源
IT-INFORMATION TECHNOLOGY | 2009年 / 51卷 / 04期
关键词
H.2 [Information Systems: Database Management; H.2.5 [Information Systems: Database Management: Heterogeneous Databases; dublication detection; algorithms; performance; data quality; data integration;
D O I
10.1524/itit.2009.0546
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Duplicate detection consists in identifying multiple, different data base representations of a same real-world object. State-of-the-art duplicate detection systems usually concentrate on identifying duplicates in a single relational table and thereby ignore that the data may exist in a larger context that, when considered, can significantly improve the performance of duplicate detection. In this paper, we present algorithms that exploit relationships that exist in the data.
引用
收藏
页码:231 / 234
页数:4
相关论文
共 50 条
[31]   Exploiting multiple a priori spectral models for adaptive radar detection [J].
Aubry, Augusto ;
Carotenuto, Vincenzo ;
De Maio, Antonio ;
Foglia, Goffredo .
IET RADAR SONAR AND NAVIGATION, 2014, 8 (07) :695-707
[32]   Exploiting Data Compression for Adaptive Block Placement in Hybrid Caches [J].
Kim, Beomjun ;
Kim, Yongtae ;
Nair, Prashant ;
Hong, Seokin .
ELECTRONICS, 2022, 11 (02)
[33]   A Web Interface for Exploiting Spatio-Temporal Heterogeneous Data [J].
Tran, Ba-Huy ;
Plumejeaud-Perreau, Christine ;
Bouju, Alain .
WEB AND WIRELESS GEOGRAPHICAL INFORMATION SYSTEMS, W2GIS 2018, 2018, 10819 :118-129
[34]   Generating and exploiting customer insights from social media data [J].
Wieneke, Alexander ;
Lehrer, Christiane .
ELECTRONIC MARKETS, 2016, 26 (03) :245-268
[35]   On subsumption relationships in data flow testing [J].
Chaim, Marcos Lordello ;
Baral, Kesina ;
Offutt, Jeff ;
Neto, Mario Concilio ;
de Araujo, Roberto Paulo Andrioli .
SOFTWARE TESTING VERIFICATION & RELIABILITY, 2023, 33 (06)
[36]   Exploiting Distributed, Heterogeneous and Sensitive Data Stocks while Maintaining the Owner's Data Sovereignty [J].
Lablans, M. ;
Kadioglu, D. ;
Muscholl, M. ;
Ueckert, F. .
METHODS OF INFORMATION IN MEDICINE, 2015, 54 (04) :346-352
[37]   A Method to Identify and Correct Problematic Software Activity Data: Exploiting Capacity Constraints and Data Redundancies [J].
Zheng, Qimu ;
Mockus, Audris ;
Zhou, Minghui .
2015 10TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE 2015) PROCEEDINGS, 2015, :637-648
[38]   Adaptive Detection and Localization Exploiting the IEEE 802.11ad Standard [J].
Grossi, Emanuele ;
Lops, Marco ;
Venturino, Luca .
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2020, 19 (07) :4394-4407
[39]   Detection of Adverse Events Through Hospital Administrative Data [J].
Marques, Bernardo ;
Sousa-Pinto, Bernardo ;
Silva-Costa, Tiago ;
Lopes, Fernando ;
Freitas, Alberto .
RECENT ADVANCES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2, 2017, 570 :825-834
[40]   Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data [J].
Steinbuss, Georg ;
Boehm, Klemens .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2021, 15 (04)