A Globally Optimal k-Anonymity Method for the De-Identification of Health Data

被引:147
作者
El Emam, Khaled [1 ,3 ]
Dankar, Fida Kamal [1 ]
Issa, Romeo [4 ]
Jonker, Elizabeth [1 ]
Amyot, Daniel [2 ]
Cogo, Elise [1 ]
Corriveau, Jean-Pierre [4 ]
Walker, Mark [5 ]
Chowdhury, Sadrul [2 ]
Vaillancourt, Regis [1 ]
Roffey, Tyson [1 ]
Bottomley, Jim [1 ]
机构
[1] Childrens Hosp Eastern Ontario, Res Inst, Ottawa, ON K1H 8L1, Canada
[2] Univ Ottawa, Sch Informat Technol & Engn, Ottawa, ON, Canada
[3] Univ Ottawa, Fac Med, Ottawa, ON, Canada
[4] Carleton Univ, Sch Comp Sci, Ottawa, ON K1S 5B6, Canada
[5] Ottawa Hosp, Res Inst, Ottawa, ON, Canada
基金
加拿大健康研究院; 加拿大自然科学与工程研究理事会;
关键词
HIPAA PRIVACY RULE; INSURANCE PORTABILITY; ACCOUNTABILITY ACT; ACCESS;
D O I
10.1197/jamia.M3144
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Background: Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified. Objective: The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets. Design: Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits. Measurement: Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithm's performance speed was also evaluated. Results: The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution. Conclusions: For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.
引用
收藏
页码:670 / 682
页数:13
相关论文
共 81 条
[61]  
*OFF INF PRIV COMM, 1994, P644 OFF INF PRIV CO
[62]  
Polettini S, 2003, NOTE INDIVIDUAL RISK
[64]  
Samarati P., 1998, PROTECTING PRIVACY W
[65]   Informed consent for research and authorization under the Health Insurance Portability and Accountability Act Privacy Rule: An integrated approach [J].
Shalowitz, D ;
Wendler, D .
ANNALS OF INTERNAL MEDICINE, 2006, 144 (09) :685-688
[66]  
Statistics Canada, 2007, THER AB SURV
[67]  
Steinberg M., 2009, HIPAA PRIVACY RULE L
[68]   k-anonymity:: A model for protecting privacy [J].
Sweeney, L .
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2002, 10 (05) :557-570
[69]  
SWEENEY L, 2001, COMPUTATIONAL DISCLO
[70]  
Sweeney L, 1997, COMPUTATIONAL DISCLO