Knowledge graph based methods for record linkage

被引:3
作者
Gautam B. [1 ]
Ramos Terrades O. [1 ]
Pujadas-Mora J.M. [2 ]
Valls M. [2 ]
机构
[1] Computer Vision Center - Universitat Autònoma de Barcelona, Edifici O Campus UAB, Bellaterra
[2] Demographic Center Studies - Universitat Autònoma de Barcelona, Edifici E2 Campus UAB, Bellaterra
关键词
Author disambiguation; Entity alignment; Historical census data; Knowledge graph embedding; Record linkage;
D O I
10.1016/j.patrec.2020.05.025
中图分类号
学科分类号
摘要
Nowadays, it is common in Historical Demography the use of individual-level data as a consequence of a predominant life-course approach for the understanding of the demographic behaviour, family transition, mobility, etc. Advanced record linkage is key since it allows increasing the data complexity and its volume to be analyzed. However, current methods are constrained to link data from the same kind of sources. Knowledge graph are flexible semantic representations, which allow to encode data variability and semantic relations in a structured manner. In this paper we propose the use of knowledge graph methods to tackle record linkage tasks. The proposed method, named WERL, takes advantage of the main knowledge graph properties and learns embedding vectors to encode census information. These embeddings are properly weighted to maximize the record linkage performance. We have evaluated this method on benchmark data sets and we have compared it to related methods with stimulating and satisfactory results. © 2020 Elsevier B.V.
引用
收藏
页码:127 / 133
页数:6
相关论文
共 36 条
  • [1] Jaro M.A., Probabilistic linkage of large public health data files, Stat. Med., 14, pp. 491-498, (1995)
  • [2] Bordes A., Usunier N., Garcia-Duran A., Weston J., Yakhnenko O., Translating embeddings for modeling multi-relational data, Advances in Neural Information Processing Systems 26, pp. 2787-2795, (2013)
  • [3] Christen P., Febrl - an open source data cleaning, deduplication and record linkage system with a graphical user interface, KDD, pp. 1065-1068, (2008)
  • [4] Draisbach U., Naumann F., DuDe: the duplicate detection toolkit, ACM - VLDB, (2010)
  • [5] Elmagarmid A.K., Ipeirotis P.G., Verykios V.S., Duplicate record detection: a survey, IEEE Trans. Knowl. Data Eng., 19, pp. 1-16, (2007)
  • [6] Fellegi I.P., Sunter A.B., A theory for record linkage, J. Am. Stat. Assoc., 64, pp. 1183-1210, (1969)
  • [7] Fu Z., Christen P., Zhou J., A graph matching method for historical census household linkage, Advances in Knowledge Discovery and Data Mining., (2014)
  • [8] Garcia Perez M.S., Tratamiento y resolucián de las descripciones definidas y su aplicacin en sistemas de extraccin de informacin, (2007)
  • [9] Garcia Ruiperez M., El empadronamiento municipal en España: evolución legislativa y tipología documental, pp. 45-86, (2012)
  • [10] Guan S., Jin X., Jia Y., Wang Y., Shen H., Cheng X., Self-learning and embedding based entity alignment, 2017 IEEE International Conference on Big Knowledge (ICBK), pp. 33-40, (2017)