Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19

被引:45
作者
Jakob, Carolin E. M. [1 ]
Kohlmayer, Florian [2 ]
Meurers, Thierry [3 ,4 ,5 ,6 ,7 ]
Vehreschild, Jorg Janne [1 ,8 ,9 ]
Prasser, Fabian [3 ,4 ,5 ,6 ,7 ]
机构
[1] Univ Hosp Cologne, Cologne, Germany
[2] Tech Univ Munich, Sch Med, Munich, Germany
[3] Berlin Inst Hlth BIH, Berlin, Germany
[4] Charite Univ Med Berlin, Berlin, Germany
[5] Free Univ Berlin, Berlin, Germany
[6] Humboldt Univ, Berlin, Germany
[7] Berlin Inst Hlth, Berlin, Germany
[8] German Ctr Infect Res DZIF, Partner Site Bonn Cologne, Cologne, Germany
[9] Goethe Univ Frankfurt, Dept Internal Med Hematol & Oncol, Frankfurt, Germany
关键词
D O I
10.1038/s41597-020-00773-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The Lean European Open Survey on SARS-CoV-2 Infected Patients (LEOSS) is a European registry for studying the epidemiology and clinical course of COVID-19. To support evidence-generation at the rapid pace required in a pandemic, LEOSS follows an Open Science approach, making data available to the public in real-time. To protect patient privacy, quantitative anonymization procedures are used to protect the continuously published data stream consisting of 16 variables on the course and therapy of COVID-19 from singling out, inference and linkage attacks. We investigated the bias introduced by this process and found that it has very little impact on the quality of output data. Current laws do not specify requirements for the application of formal anonymization methods, there is a lack of guidelines with clear recommendations and few real-world applications of quantitative anonymization procedures have been described in the literature. We therefore believe that our work can help others with developing urgently needed anonymization pipelines for their projects.
引用
收藏
页数:10
相关论文
共 30 条
[1]  
Aggarwal C. C., 2005, P 31 INT C VER LARG, V5, P901, DOI DOI 10.5555/1083592.1083696
[2]  
[Anonymous], 2020, LEAN EUROPEAN OPEN S
[3]  
[Anonymous], 2020, NHS Pathways Potential COVID-19 Open Data
[4]   Evaluating re-identification risks with respect to the HIPAA privacy rule [J].
Benitez, Kathleen ;
Malin, Bradley .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (02) :169-177
[5]   A Quick Introduction to Version Control with Git and GitHub [J].
Blischak, John D. ;
Davenport, Emily R. ;
Wilson, Greg .
PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (01)
[6]   International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium [J].
Brat, Gabriel A. ;
Weber, Griffin M. ;
Gehlenborg, Nils ;
Avillach, Paul ;
Palmer, Nathan P. ;
Chiovato, Luca ;
Cimino, James ;
Waitman, Lemuel R. ;
Omenn, Gilbert S. ;
Malovini, Alberto ;
Moore, Jason H. ;
Beaulieu-Jones, Brett K. ;
Tibollo, Valentina ;
Murphy, Shawn N. ;
L'Yi, Sehi ;
Keller, Mark S. ;
Bellazzi, Riccardo ;
Hanauer, David A. ;
Serret-Larmande, Arnaud ;
Gutierrez-Sacristan, Alba ;
Holmes, John J. ;
Bell, Douglas S. ;
Mandl, Kenneth D. ;
Follett, Robert W. ;
Klann, Jeffrey G. ;
Murad, Douglas A. ;
Scudeller, Luigia ;
Bucalo, Mauro ;
Kirchoff, Katie ;
Craig, Jean ;
Obeid, Jihad ;
Jouhet, Vianney ;
Griffier, Romain ;
Cossin, Sebastien ;
Moal, Bertrand ;
Patel, Lav P. ;
Bellasi, Antonio ;
Prokosch, Hans U. ;
Kraska, Detlef ;
Sliz, Piotr ;
Tan, Amelia L. M. ;
Ngiam, Kee Yuan ;
Zambelli, Alberto ;
Mowery, Danielle L. ;
Schiver, Emily ;
Devkota, Batsal ;
Bradford, Robert L. ;
Daniar, Mohamad ;
Daniel, Christel ;
Benoit, Vincent .
NPJ DIGITAL MEDICINE, 2020, 3 (01)
[7]  
Chinazzi M, 2020, SCIENCE, V368, P395, DOI [10.1101/2020.02.09.20021261, 10.1126/science.aba9757]
[8]  
Danezis George, 2014, Privacy and Data Protection By Design-From Policy To Engineering
[9]   Anonymising and sharing individual patient data [J].
El Emam, Khaled ;
Rodgers, Sam ;
Malin, Bradley .
BMJ-BRITISH MEDICAL JOURNAL, 2015, 350
[10]  
Elliot Mark, 2020, The Anonymisation Decision Making Framework: European Practitioners' Guide