An enhanced privacy-preserving record linkage approach for multiple databases

被引:3
作者
Han, Shumin [1 ]
Shen, Derong [1 ]
Nie, Tiezheng [1 ]
Kou, Yue [1 ]
Yu, Ge [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110169, Liaoning, Peoples R China
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2022年 / 25卷 / 05期
基金
中国国家自然科学基金;
关键词
Record linkage; Privacy; Bloom filter; Multi-LUs; Blocking;
D O I
10.1007/s10586-022-03590-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For the purpose of research, organizations often need to share and link data that belongs to a single individual while protecting the privacy, which is referred to as privacy preserving record linkage (PPRL). Various approaches have been developed to tackle this problem, however, it is still a challenging task due to the massive amount of data, multiple data sources, and 'dirty' data. Therefore, in this paper, an enhanced approximate multi-party PPRL (MP-PPRL) approach is proposed to improve privacy, scalability, and linkage quality. For privacy, bloom filter (BF) is a better and more efficient masking techniques than others so far. Thus, the records are encoded into BFs to ensure privacy. However, BFs may be compromised through frequency-based attacks. To enhance privacy, a distributed protocol that introduces multiple linkage units (Multi-LUs) to resist frequency-based attacks is proposed. In scalability, we develop a blocking technique based on sorted nearest neighborhood (SNN) approach for clustering similar BFs across multiple databases, called BF-SNN, which dramatically reduces complexity. In linkage quality, a personalized threshold that varies with different levels of 'dirty' data is introduced, which provides a more accurate error-tolerance for 'dirty' data and consequently improves linkage quality. An analysis and an empirical study are conducted on large real-world datasets to show the benefit of the proposed approach.
引用
收藏
页码:3641 / 3652
页数:12
相关论文
共 50 条
  • [21] Optimization of the Mainzelliste software for fast privacy-preserving record linkage
    Florens Rohde
    Martin Franke
    Ziad Sehili
    Martin Lablans
    Erhard Rahm
    [J]. Journal of Translational Medicine, 19
  • [22] Privacy-Preserving Access Control in Electronic Health Record Linkage
    Lu, Yang
    Sinnott, Richard O.
    Verspoor, Kain
    Parampalli, Udaya
    [J]. 2018 17TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (IEEE TRUSTCOM) / 12TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING (IEEE BIGDATASE), 2018, : 1079 - 1090
  • [23] Optimization of the Mainzelliste software for fast privacy-preserving record linkage
    Rohde, Florens
    Franke, Martin
    Sehili, Ziad
    Lablans, Martin
    Rahm, Erhard
    [J]. JOURNAL OF TRANSLATIONAL MEDICINE, 2021, 19 (01)
  • [24] Efficient Cryptanalysis of Bloom Filters for Privacy-Preserving Record Linkage
    Christen, Peter
    Ranbaduge, Thilina
    Vatsalan, Dinusha
    Schnell, Rainer
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT I, 2017, 10234 : 628 - 640
  • [25] Post-processing Methods for High Quality Privacy-Preserving Record Linkage
    Franke, Martin
    Sehili, Ziad
    Gladbach, Marcel
    Rahm, Erhard
    [J]. DATA PRIVACY MANAGEMENT, CRYPTOCURRENCIES AND BLOCKCHAIN TECHNOLOGY, 2018, 11025 : 263 - 278
  • [26] MERLIN - A Tool for Multi-party Privacy-preserving Record Linkage
    Ranbaduge, Thilina
    Vatsalan, Dinusha
    Christen, Peter
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 1640 - 1643
  • [27] Quantifying the correctness, computational complexity, and security of privacy-preserving string comparators for record linkage
    Durham, Elizabeth
    Xue, Yuan
    Kantarcioglu, Murat
    Malin, Bradley
    [J]. INFORMATION FUSION, 2012, 13 (04) : 245 - 259
  • [28] Blockchain-based Privacy-Preserving Record Linkage: enhancing data privacy in an untrusted environment
    Nobrega, Thiago
    Pires, Carlos Eduardo S.
    Nascimento, Dimas Cassimiro
    [J]. INFORMATION SYSTEMS, 2021, 102 (102)
  • [29] Incremental clustering techniques for multi-party Privacy-Preserving Record Linkage
    Vatsalan, Dinusha
    Christen, Peter
    Rahm, Erhard
    [J]. DATA & KNOWLEDGE ENGINEERING, 2020, 128 (128)
  • [30] Privacy-Preserving Linkage of Genomic and Clinical Data Sets
    Baker, Dixie B.
    Knoppers, Bartha M.
    Phillips, Mark
    van Enckevort, David
    Kaufmann, Petra
    Lochmuller, Hanns
    Taruscio, Domenica
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (04) : 1342 - 1348