Blinded Anonymization: a method for evaluating cancer prevention programs under restrictive data protection regulations

被引:3
作者
Bartholomaeus, Sebastian [1 ]
Hense, Hans Werner [2 ]
Heidinger, Oliver [1 ]
机构
[1] Epidemiol Canc Registry North Rhine Westphalia, Munster, Germany
[2] Univ Munster, Inst Epidemiol & Social Med, Munster, Germany
来源
DIGITAL HEALTHCARE EMPOWERING EUROPEANS | 2015年 / 210卷
关键词
Data Linkage; Data Aggregation; Confidentiality; Data Protection; Program Evaluation; Registries;
D O I
10.3233/978-1-61499-512-8-424
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Evaluating cancer prevention programs requires collecting and linking data on a case specific level from multiple sources of the healthcare system. Therefore, one has to comply with data protection regulations which are restrictive in Germany and will likely become stricter in Europe in general. To facilitate the mortality evaluation of the German mammography screening program, with more than 10 Million eligible women, we developed a method that does not require written individual consent and is compliant to existing privacy regulations. Our setup is composed of different data owners, a data collection center (DCC) and an evaluation center (EC). Each data owner uses a dedicated software that pre-processes plain-text personal identifiers (IDAT) and plaintext evaluation data (EDAT) in such a way that only irreversibly encrypted record assignment numbers (RAN) and pre-aggregated, reversibly encrypted EDAT are transmitted to the DCC. The DCC uses the RANs to perform a probabilistic record linkage which is based on an established and evaluated algorithm. For potentially identifying attributes within the EDAT ('quasi-identifiers'), we developed a novel process, named 'blinded anonymization'. It allows selecting a specific generalization from the pre-processed and encrypted attribute aggregations, to create a new data set with assured k-anonymity, without using any plain-text information. The anonymized data is transferred to the EC where the EDAT is decrypted and used for evaluation. Our concept was approved by German data protection authorities. We implemented a prototype and tested it with more than 1.5 Million simulated records, containing realistically distributed IDAT. The core processes worked well with regard to performance parameters. We created different generalizations and calculated the respective suppression rates. We discuss modalities, implications and limitations for large data sets in the cancer registry domain, as well as approaches for further improvements like l-diversity and automatic computation of 'optimal' generalizations.
引用
收藏
页码:424 / 428
页数:5
相关论文
共 8 条
[1]  
[Anonymous], 2005, P SIGMOD
[2]   A critique of k-anonymity and some of its enhancements [J].
Domingo-Ferrer, Josep ;
Torra, Vicenc .
ARES 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON AVAILABILITY, SECURITY AND RELIABILITY, 2008, :990-+
[3]  
European Commission, 2012, Proposal for a Regulation of the European Parliament and of the Council Amending Regulation (EC) No. 443/2009 to Define the Modalities for Reaching the 2020 Target to Reduce CO2 Emissions from New Passenger Cars
[4]   Cryptographic record linkage in population-based cancer registries [J].
Krieg, V ;
Hense, HW ;
Lehnert, M ;
Mattauch, V .
GESUNDHEITSWESEN, 2001, 63 (06) :376-382
[5]  
Machanavajjhala A., 2007, ACM T KNOWL DISCOV D, V1, P3, DOI [DOI 10.1145/1217299.1217302, 10.1145/1217299.1217302]
[6]  
Oechslin P, 2003, LECT NOTES COMPUT SC, V2729, P617
[7]  
Schmidtmann I., EVALUATION KREBSREGI
[8]   k-anonymity:: A model for protecting privacy [J].
Sweeney, L .
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2002, 10 (05) :557-570