A taxonomy of privacy-preserving record linkage techniques

被引:177
作者
Vatsalan, Dinusha [1 ]
Christen, Peter [1 ]
Verykios, Vassilios S. [2 ]
机构
[1] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT 0200, Australia
[2] Hellen Open Univ, Sch Sci & Technol, Patras, Greece
基金
匈牙利科学研究基金会;
关键词
Record linkage; Data matching; Entity resolution; Data quality; Privacy techniques; Survey; SECURITY; GENERATION; FRAMEWORK;
D O I
10.1016/j.is.2012.11.005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The process of identifying which records in two or more databases correspond to the same entity is an important aspect of data quality activities such as data pre-processing and data integration. Known as record linkage, data matching or entity resolution, this process has attracted interest from researchers in fields such as databases and data warehousing, data mining, information systems, and machine learning. Record linkage has various challenges, including scalability to large databases, accurate matching and classification, and privacy and confidentiality. The latter challenge arises because commonly personal identifying data, such as names, addresses and dates of birth of individuals, are used in the linkage process. When databases are linked across organizations, the issue of how to protect the privacy and confidentiality of such sensitive information is crucial to successful application of record linkage. In this paper we present an overview of techniques that allow the linking of databases between organizations while at the same time preserving the privacy of these data. Known as 'privacy-preserving record linkage' (PPRL), various such techniques have been developed. We present a taxonomy of PPRL techniques to characterize these techniques along 15 dimensions, and conduct a survey of PPRL techniques. We then highlight shortcomings of current techniques and discuss avenues for future research. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:946 / 969
页数:24
相关论文
共 145 条
[1]  
Agrawal Rakesh, 2003, P 2003 ACM SIGMOD IN, P86, DOI DOI 10.1145/872757.872771
[2]   A fast linkage detection scheme for multi-source information integration [J].
Aizawa, A ;
Oyama, K .
INTERNATIONAL WORKSHOP ON CHALLENGES IN WEB INFORMATION RETRIEVAL AND INTEGRATION, PROCEEDINGS, 2005, :30-39
[3]  
Al-Lawati A., 2005, PROC ACM SIGMOD WORK, P59, DOI DOI 10.1145/1077501.1077513
[4]  
[Anonymous], 2004, FDN CRYPTOGRAPHY BAS
[5]  
[Anonymous], 2007, Data quality and record linkage techniques
[6]  
[Anonymous], E ACTIVITY INTELLIGE
[7]  
[Anonymous], 2007, Quality Measures in Data Mining, DOI DOI 10.1007/978-3-540-44918-8_6
[8]  
[Anonymous], J PRIVACY CONFIDENTI
[9]  
[Anonymous], 2003, 9 ACM SIGKDD INTCONF, DOI DOI 10.1145/956750.956759
[10]  
[Anonymous], 2010, Proceedings of the 13th International Conference on Extending Database Technology, EDBT'10, DOI [10.1145/1739041.1739059, DOI 10.1145/1739041.1739059]