Quantifying identifiability to choose and audit ε in differentially private deep learning

被引:3
作者
Bernau, Daniel [1 ]
Eibl, Guenther [2 ]
Grassal, Philip W. [3 ]
Keller, Hannah [1 ]
Kerschbaum, Florian [4 ]
机构
[1] Sap SE, Karlsruhe, Germany
[2] Salzburg Univ Appl Sci, Salzburg, Austria
[3] Heidelberg Univ, Heidelberg, Germany
[4] Univ Waterloo, Waterloo, ON, Canada
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2021年 / 14卷 / 13期
基金
欧盟地平线“2020”;
关键词
COMPOSITION THEOREM;
D O I
10.14778/3484224.3484231
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Differential privacy allows bounding the influence that training data records have on a machine learning model. To use differential privacy in machine learning, data scientists must choose privacy parameters (epsilon, delta). Choosing meaningful privacy parameters is key, since models trained with weak privacy parameters might result in excessive privacy leakage, while strong privacy parameters might overly degrade model utility. However, privacy parameter values are difficult to choose for two main reasons. First, the theoretical upper bound on privacy loss (epsilon, delta) might be loose, depending on the chosen sensitivity and data distribution of practical datasets. Second, legal requirements and societal norms for anonymization often refer to individual identifiability, to which (epsilon, delta) are only indirectly related. We transform (epsilon, delta) to a bound on the Bayesian posterior belief of the adversary assumed by differential privacy concerning the presence of any record in the training dataset. The bound holds for multidimensional queries under composition, and we show that it can be tight in practice. Furthermore, we derive an identifiability bound, which relates the adversary assumed in differential privacy to previous work on membership inference adversaries. We formulate an implementation of this differential privacy adversary that allows data scientists to audit model training and compute empirical identifiability scores and empirical (epsilon, delta).
引用
收藏
页码:3335 / 3347
页数:13
相关论文
共 47 条
[1]   Deep Learning with Differential Privacy [J].
Abadi, Martin ;
Chu, Andy ;
Goodfellow, Ian ;
McMahan, H. Brendan ;
Mironov, Ilya ;
Talwar, Kunal ;
Zhang, Li .
CCS'16: PROCEEDINGS OF THE 2016 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, :308-318
[2]   An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices [J].
Abowd, John M. ;
Schmutte, Ian M. .
AMERICAN ECONOMIC REVIEW, 2019, 109 (01) :171-202
[3]  
American Department of Health and Human Services, 2010, GUID REG METH DEID P
[4]   Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds [J].
Bassily, Raef ;
Smith, Adam ;
Thakurta, Abhradeep .
2014 55TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2014), 2014, :464-473
[5]   Comparing Local and Central Differential Privacy Using Membership Inference Attacks [J].
Bernau, Daniel ;
Robl, Jonas ;
Grassal, Philip W. ;
Schneider, Steffen ;
Kerschbaum, Florian .
DATA AND APPLICATIONS SECURITY AND PRIVACY XXXV, 2021, 12840 :22-42
[6]  
Bernau Daniel, 2021, ARXIV210302913CSCR
[7]   GAN-Leaks: A Taxonomy of Membership Inference Attacks against Generative Models [J].
Chen, Dingfan ;
Yu, Ning ;
Zhang, Yang ;
Fritz, Mario .
CCS '20: PROCEEDINGS OF THE 2020 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2020, :343-362
[8]  
Clifton C, 2013, I C DATA ENGIN WORKS, P88, DOI 10.1109/ICDEW.2013.6547433
[9]  
Data Protection Working Party, 2014, OP 05 2014 AN TECHN
[10]  
Dua D., 2017, UCI Machine Learning Repository