共 3 条
Landmarks-based Blocking Method For Large-scale Entity Resolution
被引:0
作者:
Herath, Samudra
[1
]
Roughan, Matthew
[1
]
Glonek, Gary
[2
]
机构:
[1] Univ Adelaide, ARC Ctr Excellence Math & Stat Frontiers, Adelaide, SA, Australia
[2] Univ Adelaide, Sch Math Sci, Adelaide, SA, Australia
来源:
2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020)
|
2020年
关键词:
Entity resolution;
record linkage;
data matching;
multidimensional scaling;
KD-trees;
Nearest-Neighbour search;
D O I:
10.1109/DSAA49011.2020.00110
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
Large-scale entity resolution (ER) techniques have received tremendous attention due to the emergence of data processing within organizations and governments. The traditional ER process requires pairwise comparisons between each record when identifying records belong to the same entity, which is computationally prohibitive for large databases. With many existing indexing techniques to address this issue, it remains an open research question. We propose a landmarks-based indexing algorithm to reduce the possible pairwise comparisons of non-matches. The blocks are determined based on pre-selected records called landmarks in a multidimensional Euclidean space. The pair-wise comparisons only within these blocks reduce the search space immensely. Our method is scalable for big data entity resolution as it has O(n) insertion and query complexity.
引用
收藏
页码:773 / 774
页数:2
相关论文