R-U policy frontiers for health data de-identification

被引:17
作者
Xia, Weiyi [1 ]
Heatherly, Raymond [2 ]
Ding, Xiaofeng [3 ]
Li, Jiuyong [4 ]
Malin, Bradley A. [1 ,2 ]
机构
[1] Vanderbilt Univ, Dept Elect Engn & Comp Sci, Nashville, TN 37235 USA
[2] Vanderbilt Univ, Dept Biomed Informat, Nashville, TN 37235 USA
[3] Huazhong Univ Sci & Technol, Wuhan 430074, Peoples R China
[4] Univ S Australia, Sch Informat Technol & Math Sci, Mawson Lakes, SA, Australia
基金
澳大利亚研究理事会; 美国国家科学基金会; 美国国家卫生研究院; 中国国家自然科学基金;
关键词
privacy; de-identification; secondary use; policy; optimization; K-ANONYMITY; PRIVACY; INFRASTRUCTURE; INFORMATION; PROTECTION; PLATFORM; RECORDS;
D O I
10.1093/jamia/ocv004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective The Health Insurance Portability and Accountability Act Privacy Rule enables healthcare organizations to share de-identified data via two routes. They can either 1) show re-identification risk is small (e.g., via a formal model, such as k-anonymity) with respect to an anticipated recipient or 2) apply a rule-based policy (i.e., Safe Harbor) that enumerates attributes to be altered (e.g., dates to years). The latter is often invoked because it is interpretable, but it fails to tailor protections to the capabilities of the recipient. The paper shows rule-based policies can be mapped to a utility (U) and re-identification risk (R) space, which can be searched for a collection, or frontier, of policies that systematically trade off between these goals. Methods We extend an algorithm to efficiently compose an R-U frontier using a lattice of policy options. Risk is proportional to the number of patients to which a record corresponds, while utility is proportional to similarity of the original and de-identified distribution. We allow our method to search 20 000 rule-based policies (out of 2(700)) and compare the resulting frontier with k-anonymous solutions and Safe Harbor using the demographics of 10 U.S. states. Results The results demonstrate the rule-based frontier 1) consists, on average, of 5000 policies, 2% of which enable better utility with less risk than Safe Harbor and 2) the policies cover a broader spectrum of utility and risk than k-anonymity frontiers. Conclusions R-U frontiers of de-identification policies can be discovered efficiently, allowing healthcare organizations to tailor protections to anticipated needs and trustworthiness of recipients.
引用
收藏
页码:1029 / 1041
页数:13
相关论文
共 55 条
[1]  
[Anonymous], 2012, NEW YORK TIMES
[2]  
[Anonymous], 2005, 22 FED COMM STAT MET
[3]  
[Anonymous], STAND PRIV IND ID HL
[4]  
[Anonymous], 2005, CHI 05 EXTENDED ABST, DOI DOI 10.1145/1056808.1057073
[5]  
[Anonymous], 2005, P 2005 ACM SIGMOD IN
[6]  
[Anonymous], 2007, IEEE 23 INT C DAT EN, DOI [DOI 10.1073/pnas.0911686107, DOI 10.1109/ICDE.2007.367856]
[7]  
[Anonymous], 1998, P IEEE S RES SEC PRI
[8]   An international framework to promote access to data [J].
Arzberger, P ;
Schroeder, P ;
Beaulieu, A ;
Bowker, G ;
Casey, K ;
Laaksonen, L ;
Moorman, D ;
Uhlir, P ;
Wouters, P .
SCIENCE, 2004, 303 (5665) :1777-1778
[9]  
Bache K., UCI machine learning repository
[10]  
Bayardo RJ, 2005, PROC INT CONF DATA, P217