An enhanced privacy-preserving record linkage approach for multiple databases

被引:3
|
作者
Han, Shumin [1 ]
Shen, Derong [1 ]
Nie, Tiezheng [1 ]
Kou, Yue [1 ]
Yu, Ge [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110169, Liaoning, Peoples R China
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2022年 / 25卷 / 05期
基金
中国国家自然科学基金;
关键词
Record linkage; Privacy; Bloom filter; Multi-LUs; Blocking;
D O I
10.1007/s10586-022-03590-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For the purpose of research, organizations often need to share and link data that belongs to a single individual while protecting the privacy, which is referred to as privacy preserving record linkage (PPRL). Various approaches have been developed to tackle this problem, however, it is still a challenging task due to the massive amount of data, multiple data sources, and 'dirty' data. Therefore, in this paper, an enhanced approximate multi-party PPRL (MP-PPRL) approach is proposed to improve privacy, scalability, and linkage quality. For privacy, bloom filter (BF) is a better and more efficient masking techniques than others so far. Thus, the records are encoded into BFs to ensure privacy. However, BFs may be compromised through frequency-based attacks. To enhance privacy, a distributed protocol that introduces multiple linkage units (Multi-LUs) to resist frequency-based attacks is proposed. In scalability, we develop a blocking technique based on sorted nearest neighborhood (SNN) approach for clustering similar BFs across multiple databases, called BF-SNN, which dramatically reduces complexity. In linkage quality, a personalized threshold that varies with different levels of 'dirty' data is introduced, which provides a more accurate error-tolerance for 'dirty' data and consequently improves linkage quality. An analysis and an empirical study are conducted on large real-world datasets to show the benefit of the proposed approach.
引用
收藏
页码:3641 / 3652
页数:12
相关论文
共 50 条
  • [1] An enhanced privacy-preserving record linkage approach for multiple databases
    Shumin Han
    Derong Shen
    Tiezheng Nie
    Yue Kou
    Ge Yu
    Cluster Computing, 2022, 25 : 3641 - 3652
  • [2] Accurate privacy-preserving record linkage for databases with missing values
    Vaiwsri, Sirintra
    Ranbaduge, Thilina
    Christen, Peter
    Schnell, Rainer
    INFORMATION SYSTEMS, 2022, 106
  • [4] Privacy-Preserving Record Linkage with Spark
    Valkering, Onno
    Belloum, Adam
    2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 440 - 448
  • [5] Differential Cryptanalysis of Bloom Filters for Privacy-Preserving Record Linkage
    Yin, Weifeng
    Yuan, Lifeng
    Ren, Yizhi
    Meng, Weizhi
    Wang, Dong
    Wang, Qiuhua
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 6665 - 6678
  • [6] Towards Privacy-Preserving Record Linkage with Record-Wise Linkage Policy
    Kaiho, Takahito
    Lu, Wen-jie
    Amagasa, Toshiyuki
    Sakuma, Jun
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2017, PT I, 2017, 10438 : 233 - 248
  • [7] Privacy-preserving record linkage in large databases using secure multiparty computation
    Peeter Laud
    Alisa Pankova
    BMC Medical Genomics, 11
  • [8] Privacy-preserving record linkage in large databases using secure multiparty computation
    Laud, Peeter
    Pankova, Alisa
    BMC MEDICAL GENOMICS, 2018, 11
  • [9] A Tutorial on Blocking Methods for Privacy-Preserving Record Linkage
    Karapiperis, Dimitrios
    Verykios, Vassilios S.
    Katsiri, Eleftheria
    Delis, Alex
    ALGORITHMIC ASPECTS OF CLOUD COMPUTING, ALGOCLOUD 2015, 2016, 9511 : 3 - 15
  • [10] Privacy-preserving record linkage using Bloom filters
    Rainer Schnell
    Tobias Bachteler
    Jörg Reiher
    BMC Medical Informatics and Decision Making, 9