Machine Learning Models that Remember Too Much

被引：276

作者：

Song, Congzheng ^{[1
]}

Ristenpart, Thomas ^{[2
]}

Shmatikov, Vitaly ^{[2
]}

机构：

[1] Cornell Univ, Ithaca, NY 14853 USA

[2] Cornell Tech, New York, NY USA

来源：

CCS'17: PROCEEDINGS OF THE 2017 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY | 2017年

基金：

美国国家科学基金会;

关键词：

privacy; machine learning;

D O I：

10.1145/3133956.3134077

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Machine learning (ML) is becoming a commodity. Numerous ML frameworks and services are available to data holders who are not ML experts but want to train predictive models on their data. It is important that ML models trained on sensitive inputs (e.g., personal images or documents) not leak too much information about the training data. We consider a malicious ML provider who supplies model-training code to the data holder, does not observe the training, but then obtains white-or black-box access to the resulting model. In this setting, we design and implement practical algorithms, some of them very similar to standard ML techniques such as regularization and data augmentation, that "memorize" information about the training dataset in the model-yet the model is as accurate and predictive as a conventionally trained model. We then explain how the adversary can extract memorized information from the model. We evaluate our techniques on standard ML tasks for image classification (CIFAR10), face recognition (LFW and FaceScrub), and text analysis (20 Newsgroups and IMDB). In all cases, we show how our algorithms create models that have high predictive power yet allow accurate extraction of subsets of their training data.

引用

页码：587 / 601

页数：15

共 61 条

[1]

Anh Dinh TienTuan., 2015, USENIX Security

[2]

[Anonymous], 2011, CCS

[3]

[Anonymous], 2015, P S FDN COMP SCI

[4]

[Anonymous], SCI SEC PRIV MACH LE

[5]

[Anonymous], 2016, Theano: A Python framework for fast computation of mathematical ex

[6]

[Anonymous], ICDM

[7]

[Anonymous], P 49 ANN M ACL HUM L

[8]

[Anonymous], 2017, ICLR

[9]

[Anonymous], 1999, An Overview of Statistical Learning Theory

[10]

[Anonymous], 2017, S P

← 1 2 3 4 5 6 7 →