FEDERAL: A Framework for Distance-Aware Privacy-Preserving Record Linkage

被引:25
作者
Karapiperis, Dimitrios [1 ]
Gkoulalas-Divanis, Aris [3 ]
Verykios, Vassilios S. [2 ]
机构
[1] Hellen Open Univ, Patras 26335, Greece
[2] Hellen Open Univ, Grad Program Informat Syst, Patras 26335, Greece
[3] IBM Watson Hlth, IBM Watson Platform Hlth, Cambridge, MA 02142 USA
关键词
Entity resolution; privacy-preserving record linkage; locality-sensitive hashing; SIMILARITY JOINS;
D O I
10.1109/TKDE.2017.2761759
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In privacy-preserving record linkage, a number of data custodians encode their records and submit them to a trusted third-party who is responsible for identifying those records that refer to the same real-world entity. In this paper, we propose FEDERAL, a novel record linkage framework that implements methods for anonymizing both string and numerical data values, which are typically present in data records. These methods rely on a strong theoretical foundation for rigorously specifying the dimensionality of the anonymization space, into which the original values are embedded, to provide accuracy and privacy guarantees under various models of privacy attacks. A key component of the applied embedding process is the threshold that is required by the distance computations, which we prove can be formally specified to guarantee accurate results. We evaluate our framework using three real-world data sets with varying characteristics. Our experimental findings show that FEDERAL offers a complete and effective solution for accurately identifying matching anonymized record pairs (with recall rates constantly above 93 percent) in large-scale privacy-preserving record linkage tasks.
引用
收藏
页码:292 / 304
页数:13
相关论文
共 34 条
[1]   Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions [J].
Andoni, Alexandr ;
Indyk, Piotr .
COMMUNICATIONS OF THE ACM, 2008, 51 (01) :117-122
[2]  
[Anonymous], 2010, Proceedings of the 13th International Conference on Extending Database Technology, EDBT'10, DOI [10.1145/1739041.1739059, DOI 10.1145/1739041.1739059]
[3]  
Atallah M.J., 2003, ACM WPES 2003, P39
[4]  
Bonomi L., 2012, P 21 ACM INT C INF K, P1597, DOI DOI 10.1145/2396761.2398480
[5]   Efficient Cryptanalysis of Bloom Filters for Privacy-Preserving Record Linkage [J].
Christen, Peter ;
Ranbaduge, Thilina ;
Vatsalan, Dinusha ;
Schnell, Rainer .
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT I, 2017, 10234 :628-640
[6]   Some methods for blindfolded record linkage [J].
Churches T. ;
Christen P. .
BMC Medical Informatics and Decision Making, 4 (1)
[7]   Composite Bloom Filters for Secure Record Linkage [J].
Durham, Elizabeth A. ;
Kantarcioglu, Murat ;
Xue, Yuan ;
Toth, Csaba ;
Kuzu, Mehmet ;
Malin, Bradley .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (12) :2956-2968
[8]   A hybrid approach to private record linkage [J].
Inan, Ali ;
Kantarcioglu, Murat ;
Bertino, Elisa ;
Scannapieco, Monica .
2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, :496-+
[9]  
Karapiperis D., 2016, EDBT, P209
[10]  
Karapiperis D., 2016, 2016 IEEE 21 INT C E, P1, DOI DOI 10.1109/ETFA.2016.7733532