A Globally Optimal k-Anonymity Method for the De-Identification of Health Data

被引:144
作者
El Emam, Khaled [1 ,3 ]
Dankar, Fida Kamal [1 ]
Issa, Romeo [4 ]
Jonker, Elizabeth [1 ]
Amyot, Daniel [2 ]
Cogo, Elise [1 ]
Corriveau, Jean-Pierre [4 ]
Walker, Mark [5 ]
Chowdhury, Sadrul [2 ]
Vaillancourt, Regis [1 ]
Roffey, Tyson [1 ]
Bottomley, Jim [1 ]
机构
[1] Childrens Hosp Eastern Ontario, Res Inst, Ottawa, ON K1H 8L1, Canada
[2] Univ Ottawa, Sch Informat Technol & Engn, Ottawa, ON, Canada
[3] Univ Ottawa, Fac Med, Ottawa, ON, Canada
[4] Carleton Univ, Sch Comp Sci, Ottawa, ON K1S 5B6, Canada
[5] Ottawa Hosp, Res Inst, Ottawa, ON, Canada
基金
加拿大健康研究院; 加拿大自然科学与工程研究理事会;
关键词
HIPAA PRIVACY RULE; INSURANCE PORTABILITY; ACCOUNTABILITY ACT; ACCESS;
D O I
10.1197/jamia.M3144
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Background: Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified. Objective: The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets. Design: Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits. Measurement: Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithm's performance speed was also evaluated. Results: The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution. Conclusions: For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.
引用
收藏
页码:670 / 682
页数:13
相关论文
共 81 条
  • [1] ADAM NR, 1989, COMPUT SURV, V21, P515, DOI 10.1145/76894.76895
  • [2] AGGARWAL G, 2005, P 10 INT C DAT BAS T
  • [3] Aggarwal Gagan, 2005, Journal of Privacy Technology (JOPT)
  • [4] ALEXANDER LA, 1978, SOC SECUR BULL, V41, P3
  • [5] [Anonymous], ROI SOFTWARE QUALITY
  • [6] [Anonymous], COMPUTATIONAL COMPLE
  • [7] [Anonymous], 2005, CIHR BEST PRACT PROT
  • [8] [Anonymous], NCES STAT STAND
  • [9] [Anonymous], 2006, PERSONAL DATA PUBLIC
  • [10] [Anonymous], CONFIDENTIALITY DISC