Privacy-preserving record linkage using reference set based encoding: A single parameter method

被引:0
作者
Ziyad, Sumayya [1 ,2 ]
Christen, Peter [1 ]
Vidanage, Anushka [1 ]
Nanayakkara, Charini [1 ]
Schnell, Rainer
机构
[1] Australian Natl Univ, Canberra 2600, Australia
[2] Univ Duisburg Essen, D-47057 Duisburg, Germany
关键词
Record linkage; Privacy preserving record linkage; Bloom filter; Binary encoding;
D O I
10.1016/j.is.2025.102569
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Record linkage is the process of matching records that refer to the same entity across two or more databases. In many application areas, ranging from healthcare to government services, the databases to be linked contain sensitive personal information, and hence, cannot be shared across organisations. Privacy-Preserving Record Linkage (PPRL) aims to overcome this challenge by facilitating the comparison of records that have been encoded or encrypted, thereby allowing linkage without the need of sharing any sensitive data. While various PPRL techniques have been developed, most of them do not properly address privacy concerns, such as the various vulnerabilities of encoded data with regard to cryptanalysis attacks. Existing PPRL methods, furthermore, do not provide conceptual analyses of how a user should set the various parameters required, possibly leading to sub-optimal results with regard to both linkage quality and privacy protection. Here we present a novel encoding method for PPRL that employs reference q-gram sets to generate bit arrays that represent sensitive values. Our method requires a single user parameter that determines a trade-off between linkage quality, scalability, and privacy. All other parameters are either data driven or have strong bounds based on the user-set parameter. Furthermore, our method addresses the length, frequency, and pattern-based PPRL vulnerabilities that are exploited by existing PPRL attacks. We conceptually analyse our method and experimentally evaluate it using multiple databases. Our results show that our method provides robust results for both high linkage quality and strong privacy protection.
引用
收藏
页数:20
相关论文
共 62 条
[1]  
Al-Lawati A., 2005, PROC ACM SIGMOD WORK, P59
[2]  
Atallab M., 2003, C M WORKSHOP PRIVACY, P39
[3]   (Almost) all of entity resolution [J].
Binette, Olivier ;
Steorts, Rebecca C. .
SCIENCE ADVANCES, 2022, 8 (12)
[4]   SPACE/TIME TRADE/OFFS IN HASH CODING WITH ALLOWABLE ERRORS [J].
BLOOM, BH .
COMMUNICATIONS OF THE ACM, 1970, 13 (07) :422-&
[5]  
Bonomi L., 2012, ACM C INF KNOWL MAN, P1597, DOI DOI 10.1145/2396761.2398480
[6]  
Boyd J.H., 2015, Medical Data Privacy Handbook, P267
[7]  
Christen P., 2012, DATA MATCHING CONCEP
[8]  
Christen P., 2018, Trans. Knowl. Data Eng.
[9]  
Christen P., 2020, Linking Sensitive Data, DOI DOI 10.1007/978-3-030-59706-1
[10]   A critique and attack on "Blockchain-based privacy-preserving record [J].
Christen, Peter ;
Schnell, Rainer ;
Ranbaduge, Thilina ;
Vidanage, Anushka .
INFORMATION SYSTEMS, 2022, 108