An Improved Chinese String Comparator for Bloom Filter Based Privacy-Preserving Record Linkage

被引:1
|
作者
Sun, Siqi [1 ]
Qian, Yining [1 ]
Zhang, Ruoshi [1 ]
Wang, Yanqi [1 ]
Li, Xinran [1 ]
机构
[1] Huazhong Agr Univ, Coll Sci, Dept Math & Stat, Wuhan 430070, Peoples R China
基金
中国国家自然科学基金;
关键词
privacy-preserving record linkage; Chinese characters; SoundShape code; Bloom filter; proportions of SoundShape code;
D O I
10.3390/e23081091
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
With the development of information technology, it has become a popular topic to share data from multiple sources without privacy disclosure problems. Privacy-preserving record linkage (PPRL) can link the data that truly matches and does not disclose personal information. In the existing studies, the techniques of PPRL have mostly been studied based on the alphabetic language, which is much different from the Chinese language environment. In this paper, Chinese characters (identification fields in record pairs) are encoded into strings composed of letters and numbers by using the SoundShape code according to their shapes and pronunciations. Then, the SoundShape codes are encrypted by Bloom filter, and the similarity of encrypted fields is calculated by Dice similarity. In this method, the false positive rate of Bloom filter and different proportions of sound code and shape code are considered. Finally, we performed the above methods on the synthetic datasets, and compared the precision, recall, F1-score and computational time with different values of false positive rate and proportion. The results showed that our method for PPRL in Chinese language environment improved the quality of the classification results and outperformed others with a relatively low additional cost of computation.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] A Distributed Near-Optimal LSH-based Framework for Privacy-Preserving Record Linkage
    Karapiperis, Dimitrios
    Verykios, Vassilios S.
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2014, 11 (02) : 745 - 763
  • [22] Privacy-preserving record linkage in large databases using secure multiparty computation
    Peeter Laud
    Alisa Pankova
    BMC Medical Genomics, 11
  • [23] Privacy-preserving record linkage in large databases using secure multiparty computation
    Laud, Peeter
    Pankova, Alisa
    BMC MEDICAL GENOMICS, 2018, 11
  • [24] Proposed Framework for Adopting Privacy-Preserving Record Linkage for Public Health Action
    Pathak, Aditi
    Serrer, Laina
    Bhalla, Meera
    King, Raymond
    Mirel, Lisa B.
    Srinivasan, Arunkumar
    Baier, Patrick
    Zapata, Daniela
    David-Ferdon, Corinne
    Luxenberg, Steven
    Gundlapalli, Adi V.
    JOURNAL OF PUBLIC HEALTH MANAGEMENT AND PRACTICE, 2025, 31 (01) : E26 - E33
  • [25] Private Blocking Technique for Multi-party Privacy-Preserving Record Linkage
    Han S.
    Shen D.
    Nie T.
    Kou Y.
    Yu G.
    Data Science and Engineering, 2017, 2 (2) : 187 - 196
  • [26] Privacy preserving record linkage approaches
    Verykios, Vassilios S.
    Karakasidis, Alexandros
    Mitrogiannis, Vassilios K.
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2009, 1 (02) : 206 - 221
  • [27] Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network
    Bian, Jiang
    Loiacono, Alexander
    Sura, Andrei
    Viramontes, Tonatiuh Mendoza
    Lipori, Gloria
    Guo, Yi
    Shenkman, Elizabeth
    Hogan, William
    JAMIA OPEN, 2019, 2 (04) : 562 - 569
  • [28] Accurate and efficient privacy-preserving string matching
    Sirintra Vaiwsri
    Thilina Ranbaduge
    Peter Christen
    International Journal of Data Science and Analytics, 2022, 14 : 191 - 215
  • [29] Privacy-Preserving Record Linkage Using Local Sensitive Hash and Private Set Intersection
    Adir, Allon
    Aharoni, Ehud
    Drucker, Nir
    Kushnir, Eyal
    Masalha, Ramy
    Mirkin, Michael
    Soceanu, Omri
    APPLIED CRYPTOGRAPHY AND NETWORK SECURITY WORKSHOPS, ACNS 2022, 2022, 13285 : 398 - 424
  • [30] Accurate and efficient privacy-preserving string matching
    Vaiwsri, Sirintra
    Ranbaduge, Thilina
    Christen, Peter
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2022, 14 (02) : 191 - 215