An enhanced privacy-preserving record linkage approach for multiple databases

被引:3
作者
Han, Shumin [1 ]
Shen, Derong [1 ]
Nie, Tiezheng [1 ]
Kou, Yue [1 ]
Yu, Ge [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110169, Liaoning, Peoples R China
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2022年 / 25卷 / 05期
基金
中国国家自然科学基金;
关键词
Record linkage; Privacy; Bloom filter; Multi-LUs; Blocking;
D O I
10.1007/s10586-022-03590-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For the purpose of research, organizations often need to share and link data that belongs to a single individual while protecting the privacy, which is referred to as privacy preserving record linkage (PPRL). Various approaches have been developed to tackle this problem, however, it is still a challenging task due to the massive amount of data, multiple data sources, and 'dirty' data. Therefore, in this paper, an enhanced approximate multi-party PPRL (MP-PPRL) approach is proposed to improve privacy, scalability, and linkage quality. For privacy, bloom filter (BF) is a better and more efficient masking techniques than others so far. Thus, the records are encoded into BFs to ensure privacy. However, BFs may be compromised through frequency-based attacks. To enhance privacy, a distributed protocol that introduces multiple linkage units (Multi-LUs) to resist frequency-based attacks is proposed. In scalability, we develop a blocking technique based on sorted nearest neighborhood (SNN) approach for clustering similar BFs across multiple databases, called BF-SNN, which dramatically reduces complexity. In linkage quality, a personalized threshold that varies with different levels of 'dirty' data is introduced, which provides a more accurate error-tolerance for 'dirty' data and consequently improves linkage quality. An analysis and an empirical study are conducted on large real-world datasets to show the benefit of the proposed approach.
引用
收藏
页码:3641 / 3652
页数:12
相关论文
共 50 条
[31]   Privacy-preserving record linkage using reference set based encoding: A single parameter method [J].
Ziyad, Sumayya ;
Christen, Peter ;
Vidanage, Anushka ;
Nanayakkara, Charini ;
Schnell, Rainer .
INFORMATION SYSTEMS, 2025, 133
[32]   Privacy-Preserving Linkage of Genomic and Clinical Data Sets [J].
Baker, Dixie B. ;
Knoppers, Bartha M. ;
Phillips, Mark ;
van Enckevort, David ;
Kaufmann, Petra ;
Lochmuller, Hanns ;
Taruscio, Domenica .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (04) :1342-1348
[33]   Privacy Preserving Record Linkage using MetaSoundex Algorithm [J].
Koneru, Keerthi ;
Varol, Cihan .
2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, :443-447
[34]   Privacy-Preserving Updates to Anonymous and Confidential Databases [J].
Trombetta, Alberto ;
Jiang, Wei ;
Bertino, Elisa ;
Bossi, Lorenzo .
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2011, 8 (04) :578-587
[35]   Bloom Encodings in DGA Detection: Improving Machine Learning Privacy by Building on Privacy-Preserving Record Linkage [J].
Nitz, Lasse ;
Mandal, Avikarsha .
JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2024, 30 (09) :1224-1243
[36]   P-Signature-Based Blocking to Improve the Scalability of Privacy-Preserving Record Linkage [J].
Vatsalan, Dinusha ;
Yu, Joyce ;
Thorne, Brian ;
Henecka, Wilko .
DATA PRIVACY MANAGEMENT, CRYPTOCURRENCIES AND BLOCKCHAIN TECHNOLOGY, ESORICS 2020, DPM 2020, CBT 2020, 2020, 12484 :35-51
[37]   An Improved Chinese String Comparator for Bloom Filter Based Privacy-Preserving Record Linkage [J].
Sun, Siqi ;
Qian, Yining ;
Zhang, Ruoshi ;
Wang, Yanqi ;
Li, Xinran .
ENTROPY, 2021, 23 (08)
[38]   Scalable privacy-preserving linking of multiple databases using counting Bloom filters [J].
Vatsalan, Dinusha ;
Christen, Peter ;
Rahm, Erhard .
2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, :882-889
[39]   Privacy preserving interactive record linkage (PPIRL) [J].
Kum, Hye-Chung ;
Krishnamurthy, Ashok ;
Machanavajjhala, Ashwin ;
Reiter, Michael K. ;
Ahalt, Stanley .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2014, 21 (02) :212-220
[40]   A blinded evaluation of privacy preserving record linkage with Bloom filters [J].
Sean Randall ;
Helen Wichmann ;
Adrian Brown ;
James Boyd ;
Tom Eitelhuber ;
Alexandra Merchant ;
Anna Ferrante .
BMC Medical Research Methodology, 22