An Improved Chinese String Comparator for Bloom Filter Based Privacy-Preserving Record Linkage

被引:1
作者
Sun, Siqi [1 ]
Qian, Yining [1 ]
Zhang, Ruoshi [1 ]
Wang, Yanqi [1 ]
Li, Xinran [1 ]
机构
[1] Huazhong Agr Univ, Coll Sci, Dept Math & Stat, Wuhan 430070, Peoples R China
基金
中国国家自然科学基金;
关键词
privacy-preserving record linkage; Chinese characters; SoundShape code; Bloom filter; proportions of SoundShape code;
D O I
10.3390/e23081091
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
With the development of information technology, it has become a popular topic to share data from multiple sources without privacy disclosure problems. Privacy-preserving record linkage (PPRL) can link the data that truly matches and does not disclose personal information. In the existing studies, the techniques of PPRL have mostly been studied based on the alphabetic language, which is much different from the Chinese language environment. In this paper, Chinese characters (identification fields in record pairs) are encoded into strings composed of letters and numbers by using the SoundShape code according to their shapes and pronunciations. Then, the SoundShape codes are encrypted by Bloom filter, and the similarity of encrypted fields is calculated by Dice similarity. In this method, the false positive rate of Bloom filter and different proportions of sound code and shape code are considered. Finally, we performed the above methods on the synthetic datasets, and compared the precision, recall, F1-score and computational time with different values of false positive rate and proportion. The results showed that our method for PPRL in Chinese language environment improved the quality of the classification results and outperformed others with a relatively low additional cost of computation.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] De-identified Bayesian personal identity matching for privacy-preserving record linkage despite errors: development and validation
    Cardinal, Rudolf N.
    Moore, Anna
    Burchell, Martin
    Lewis, Jonathan R.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2023, 23 (01)
  • [42] De-identified Bayesian personal identity matching for privacy-preserving record linkage despite errors: development and validation
    Rudolf N. Cardinal
    Anna Moore
    Martin Burchell
    Jonathan R. Lewis
    BMC Medical Informatics and Decision Making, 23
  • [43] Salting as a Countermeasure against Attacks on Privacy Preserving Record Linkage Techniques
    Chen, Yanling
    Schnell, Rainer
    Armknecht, Frederik
    Heng, Youzhe
    HEALTHINF: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 5: HEALTHINF, 2021, : 353 - 360
  • [44] Privacy preserving record linkage for public health action: opportunities and challenges
    Pathak, Aditi
    Serrer, Laina
    Zapata, Daniela
    King, Raymond
    Mirel, Lisa B.
    Sukalac, Thomas
    Srinivasan, Arunkumar
    Baier, Patrick
    Bhalla, Meera
    David-Ferdon, Corinne
    Luxenberg, Steven
    Gundlapalli, Adi, V
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (11) : 2605 - 2612
  • [45] A Framework for Consensual and Online Privacy Preserving Record Linkage in Real-Time
    Mueller, Daniel
    Mau, Stefan
    Cvijikj, Irena Pletikosa
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2591 - 2599
  • [46] Implementing privacy preserving record linkage: Insights from Australian use cases
    Randall, Sean
    Brown, Adrian
    Ferrante, Anna
    Boyd, James
    Robinson, Suzanne
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2024, 191
  • [47] Assessing the impact of privacy-preserving record linkage on record overlap and patient demographic and clinical characteristics in PCORnet®, the National Patient-Centered Clinical Research Network
    Marsolo, Keith
    Kiernan, Daniel
    Toh, Sengwee
    Phua, Jasmin
    Louzao, Darcy
    Haynes, Kevin
    Weiner, Mark
    Angulo, Francisco
    Bailey, Charles
    Bian, Jiang
    Fort, Daniel
    Grannis, Shaun
    Krishnamurthy, Ashok Kumar
    Nair, Vinit
    Rivera, Pedro
    Silverstein, Jonathan
    Zirkle, Maryan
    Carton, Thomas
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2023, 30 (03) : 447 - 455
  • [48] String Comparators for Chinese-Characters-Based Record Linkages
    Xu, Senlin
    Zheng, Mingfan
    Li, Xinran
    IEEE ACCESS, 2021, 9 : 3735 - 3743
  • [49] Generating-Set Evaluation of Bloom Filter Hardening Techniques in Private Record Linkage
    Mortl, Karin
    Dewri, Rinku
    INFORMATION SYSTEMS SECURITY, ICISS 2022, 2022, 13784 : 44 - 63
  • [50] A Privacy-preserving Proximity Testing for Location-based Services
    Qiu, Yue
    Ma, Maode
    2018 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2018,