On the Role of Data Anonymization in Machine Learning Privacy

被引:12
作者
Senavirathne, Navoda [1 ]
Torra, Vicenc [2 ]
机构
[1] Univ Skovde, Sch Informat, Skovde, Sweden
[2] Univ Umea, Dept Comp Sci, Umea, Sweden
来源
2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020) | 2020年
关键词
data privacy; data anonymization; privacy preserving machine learning; K-ANONYMITY;
D O I
10.1109/TrustCom50675.2020.00093
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data anonymization irrecoverably transforms the raw data into a protected version by eliminating direct identifiers and removing sufficient details from indirect identifiers in order to minimize the risk of re-identification when there is a requirement for data publishing. Nevertheless, data protection laws (i.e., GDPR) do not consider anonymized data as personal data thus allowing them to be freely used, analysed, shared and monetized without a compliance risk. Motivated by the above advantages, it is plausible that the data controllers anonymize the data before releasing them for any data analysis tasks such as machine learning (ML); which is applied in a wide variety of domains where personal data are used. Moreover, in recent research, it has shown that ML models are vulnerable to privacy attacks as they retain sensitive information from the training data. Taking all of these facts into consideration, in this work we explore the interplay between data anonymization and ML with the ultimate aim of clarifying whether data anonymization is sufficient to achieve privacy for ML under different adversarial scenarios. We also discuss the challenges and opportunities of integrating these two domains. As per our findings, it is conspicuous that in order to substantially minimize the privacy risks in ML, existing data anonymization techniques have to be applied with high privacy levels that cause a deterioration in model utility.
引用
收藏
页码:664 / 675
页数:12
相关论文
共 32 条
[1]   Deep Learning with Differential Privacy [J].
Abadi, Martin ;
Chu, Andy ;
Goodfellow, Ian ;
McMahan, H. Brendan ;
Mironov, Ilya ;
Talwar, Kunal ;
Zhang, Li .
CCS'16: PROCEEDINGS OF THE 2016 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, :308-318
[2]  
[Anonymous], 2018, GEN DATA PROTECTION, V1
[3]  
[Anonymous], P STAT CAN S
[4]  
Ateniese G., 2013, INT J SECURITY NETWO, V10
[5]   "You Might Also Like:" Privacy Risks of Collaborative Filtering [J].
Calandrino, Joseph A. ;
Kilzer, Ann ;
Narayanan, Arvind ;
Felten, Edward W. ;
Shmatikov, Vitaly .
2011 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2011), 2011, :231-246
[6]   Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures [J].
Fredrikson, Matt ;
Jha, Somesh ;
Ristenpart, Thomas .
CCS'15: PROCEEDINGS OF THE 22ND ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2015, :1322-1333
[7]  
Gouweleeuw J.M., 1998, Journal of Official Statistics, V14, P463, DOI [DOI 10.1126/science.1229566, DOI 10.1002/9781118348239]
[8]  
Hayes J., 2017, ARXIV PREPRINT ARXIV
[9]   Classifying data from protected statistical datasets [J].
Herranz, Javier ;
Matwin, Stan ;
Nin, Jordi ;
Torra, Vicenc .
COMPUTERS & SECURITY, 2010, 29 (08) :875-890
[10]   Differential Privacy: An Economic Method for Choosing Epsilon [J].
Hsu, Justin ;
Gaboardi, Marco ;
Haeberlen, Andreas ;
Khanna, Sanjeev ;
Narayan, Arjun ;
Pierce, Benjamin C. ;
Roth, Aaron .
2014 IEEE 27TH COMPUTER SECURITY FOUNDATIONS SYMPOSIUM (CSF), 2014, :398-410