How Data Anonymisation Techniques influence Disease Triage in Digital Health: A study on Base Rate Neglect

被引:0
作者
Podlesny, Nikolai J. [1 ]
Kayem, Anne V. D. M. [1 ]
Meinel, Christoph [1 ]
Jungmann, Sven [2 ]
机构
[1] Hasso Plattner Inst, Potsdam, Germany
[2] FoundersLane GmbH, Berlin, Germany
来源
PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON DIGITAL PUBLIC HEALTH (DPH '19) | 2019年
关键词
DIAGNOSIS; PRIVACY; NOISE;
D O I
10.1145/3357729.3357737
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In the digital health area, there is a growing trend towards data-driven disease diagnostics and prescription triage. Discussions with health industry partners have revealed that distributed high-dimensional data repositories are not only helpful to medical & drug research but also for algorithms that support day-to-day medical diagnostics such as detecting Atrial fibrillation (AFib) [53]. Yet, recent privacy legislation in Europe requires that such data repositories be anonymised to protect against personal information exposure. Existing anonymisation algorithms work on the premise of transforming data to remove outliers that can result in re-identifications of individual records. While on the one hand this protects against data exposure, on the other hand anonymisation inadvertently results in base rate neglect(1). In the medical diagnostics context, base rate neglect can lead to false diagnostics and prescription triage, which is undesirable. In this paper, we study the impact of different anonymisation techniques on real-world disease diagnostics, and how they potentially influence decision making based on a real-world case as well as a semi-synthetic health data set. We demonstrate that the best results countervailing base rate neglect and ensuring data anonymity are obtained through the composition of several selected but dynamic per-row assigned anonymisation approaches incorporating attribute compartmentation.
引用
收藏
页码:55 / 62
页数:8
相关论文
共 51 条
[41]   Attribute Compartmentation and Greedy UCC Discovery for High-Dimensional Data Anonymisation [J].
Podlesny, Nikolai J. ;
Kayem, Anne V. D. M. ;
Meinel, Christoph .
PROCEEDINGS OF THE NINTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY (CODASPY '19), 2019, :109-119
[42]  
Podlesny Nikolai J., 2018, INT C DAT EXP SYST A
[43]   Probing genetic overlap among complex human phenotypes [J].
Rzhetsky, Andrey ;
Wajngurt, David ;
Park, Naeun ;
Zheng, Tian .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (28) :11694-11699
[44]   Machine learning for detection and diagnosis of disease [J].
Sajda, Paul .
ANNUAL REVIEW OF BIOMEDICAL ENGINEERING, 2006, 8 :537-565
[45]  
Samarati P., 1998, PROTECTING PRIVACY D
[46]   Network-Based Elucidation of Human Disease Similarities Reveals Common Functional Modules Enriched for Pluripotent Drug Targets [J].
Suthram, Silpa ;
Dudley, Joel T. ;
Chiang, Annie P. ;
Chen, Rong ;
Hastie, Trevor J. ;
Butte, Atul J. .
PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (02)
[47]   k-anonymity:: A model for protecting privacy [J].
Sweeney, L .
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2002, 10 (05) :557-570
[48]  
Thompson C., THEORY CORRES
[49]   The worst-case time complexity for generating all maximal cliques and computational experiments [J].
Tomita, Etsuji ;
Tanaka, Akira ;
Takahashi, Haruhisa .
THEORETICAL COMPUTER SCIENCE, 2006, 363 (01) :28-42
[50]   Detecting atrial fibrillation by deep convolutional neural networks [J].
Xia, Yong ;
Wulan, Naren ;
Wang, Kuanquan ;
Zhang, Henggui .
COMPUTERS IN BIOLOGY AND MEDICINE, 2018, 93 :84-92