Duplicate Detection Exploiting Data Relationships

被引:0
作者
Herschel, Melanie [1 ]
机构
[1] Univ Tubingen, Wilhelm Schickard Inst Informat, Lehrstuhl Datenbanksyst, Sand 13, D-72076 Tubingen, Germany
来源
IT-INFORMATION TECHNOLOGY | 2009年 / 51卷 / 04期
关键词
H.2 [Information Systems: Database Management; H.2.5 [Information Systems: Database Management: Heterogeneous Databases; dublication detection; algorithms; performance; data quality; data integration;
D O I
10.1524/itit.2009.0546
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Duplicate detection consists in identifying multiple, different data base representations of a same real-world object. State-of-the-art duplicate detection systems usually concentrate on identifying duplicates in a single relational table and thereby ignore that the data may exist in a larger context that, when considered, can significantly improve the performance of duplicate detection. In this paper, we present algorithms that exploit relationships that exist in the data.
引用
收藏
页码:231 / 234
页数:4
相关论文
共 50 条
[41]   Supervised Anomaly Detection in Uncertain Pseudoperiodic Data Streams [J].
Ma, Jiangang ;
Sun, Le ;
Wang, Hua ;
Zhang, Yanchun ;
Aickelin, Uwe .
ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2016, 16 (01)
[42]   Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data [J].
Steinbuss, Georg ;
Boehm, Klemens .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2021, 15 (04)
[43]   Unsupervised detection of contextual anomaly in remotely sensed data [J].
Liu, Qi ;
Klucik, Rudy ;
Chen, Chao ;
Grant, Glenn ;
Gallaher, David ;
Lv, Qin ;
Shang, Li .
REMOTE SENSING OF ENVIRONMENT, 2017, 202 :75-87
[44]   Exploiting Trace Data for Adaptive Mobile Video Streaming with Performance Guarantees [J].
Wu, Victor K. C. ;
Liu, Yan ;
Lee, Jack Y. B. .
2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2016, :89-96
[45]   Exploiting the performance gains of modern disk drives by enhancing data locality [J].
Deng, Yuhui .
INFORMATION SCIENCES, 2009, 179 (14) :2494-2511
[46]   Cap: Exploiting Data Correlations to Improve the Performance and Endurance of SSD RAID [J].
Xu, Gaoxiang ;
Tan, Zhipeng ;
Feng, Dan ;
Zhu, Yifeng ;
Zhang, Xinyan ;
Xu, Jie .
2018 IEEE 36TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2018, :59-66
[47]   A Framework for Exploiting Local Information to Enhance Density Estimation of Data Streams [J].
Boedihardjo, Arnold P. ;
Lu, Chang-Tien ;
Wang, Bingsheng .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2014, 9 (01)
[48]   Exploiting forwarding to improve data bandwidth of instruction-set extensions [J].
Jayaseelan, Ramkumar ;
Liu, Haibin ;
Mitra, Tulika .
43RD DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 2006, 2006, :43-+
[49]   A simple and efficient approach for reducing TCP timeouts due to lack of duplicate acknowledgments in data center networks [J].
Sreekumari, Prasanthi ;
Jung, Jae-il ;
Lee, Meejeong .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2016, 19 (02) :633-645
[50]   Data quality probes-exploiting and improving the quality of electronic patient record data and patient care [J].
Brown, PJB ;
Warmington, V .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2002, 68 (1-3) :91-98