An Improved Chinese String Comparator for Bloom Filter Based Privacy-Preserving Record Linkage

被引:2
作者
Sun, Siqi [1 ]
Qian, Yining [1 ]
Zhang, Ruoshi [1 ]
Wang, Yanqi [1 ]
Li, Xinran [1 ]
机构
[1] Huazhong Agr Univ, Coll Sci, Dept Math & Stat, Wuhan 430070, Peoples R China
基金
中国国家自然科学基金;
关键词
privacy-preserving record linkage; Chinese characters; SoundShape code; Bloom filter; proportions of SoundShape code;
D O I
10.3390/e23081091
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
With the development of information technology, it has become a popular topic to share data from multiple sources without privacy disclosure problems. Privacy-preserving record linkage (PPRL) can link the data that truly matches and does not disclose personal information. In the existing studies, the techniques of PPRL have mostly been studied based on the alphabetic language, which is much different from the Chinese language environment. In this paper, Chinese characters (identification fields in record pairs) are encoded into strings composed of letters and numbers by using the SoundShape code according to their shapes and pronunciations. Then, the SoundShape codes are encrypted by Bloom filter, and the similarity of encrypted fields is calculated by Dice similarity. In this method, the false positive rate of Bloom filter and different proportions of sound code and shape code are considered. Finally, we performed the above methods on the synthetic datasets, and compared the precision, recall, F1-score and computational time with different values of false positive rate and proportion. The results showed that our method for PPRL in Chinese language environment improved the quality of the classification results and outperformed others with a relatively low additional cost of computation.
引用
收藏
页数:14
相关论文
共 29 条
[1]  
[Anonymous], 1993, P SECTION SURVEY RES
[2]  
[Anonymous], 2010, Proceedings of the 13th International Conference on Extending Database Technology, EDBT'10, DOI [DOI 10.1145/1739041.1739059, 10.1145/1739041.1739059]
[3]  
Bonomi L., 2012, ARXIV201212082773
[4]  
Burkhardt S., 2001, Combinatorial Pattern Matching. 12th Annual Symposium, CPM 2001. Proceedings (Lecture Notes in Computer Science Vol. 2089), P73
[5]  
Chen M., 2018, INF TECHNOL, V11, P73, DOI DOI 10.13274/J.CNKI.HDZJ.2018.11.016
[6]  
Chen Q.W., 2016, Chinese Patent, Patent No. [No. CN201210018390.6, 2012100183906]
[7]  
Chen Q.W., 2007, J SHANTOU U SCI ED, V26, P255
[8]  
[陈钦梧 Chen Qinwu], 2010, [微计算机信息, Microcomputer Information.], V26, P252
[9]  
Christen Peter, 2018, Advances in Knowledge Discovery and Data Mining. 22nd Pacific-Asia Conference, PAKDD 2018. Proceedings: LNAI 10939, P530, DOI 10.1007/978-3-319-93040-4_42
[10]   Precise and Fast Cryptanalysis for Bloom Filter Based Privacy-Preserving Record Linkage [J].
Christen, Peter ;
Ranbaduge, Thilina ;
Vatsalan, Dinusha ;
Schnell, Rainer .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (11) :2164-2177