Privacy Preserving Synthetic Data Release Using Deep Learning

被引：70

作者：

Abay, Nazmiye Ceren ^{[1
]}

Zhou, Yan ^{[1
]}

Kantarcioglu, Murat ^{[1
,2
]}

Thuraisingham, Bhavani ^{[1
]}

Sweeney, Latanya ^{[2
]}

机构：

[1] Univ Texas Dallas, Richardson, TX 75083 USA

[2] Harvard Univ, Cambridge, MA 02138 USA

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT I | 2019年 / 11051卷

关键词：

Differential privacy; Deep learning; Data generation; NOISE;

D O I：

10.1007/978-3-030-10925-7_31

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For many critical applications ranging from health care to social sciences, releasing personal data while protecting individual privacy is paramount. Over the years, data anonymization and synthetic data generation techniques have been proposed to address this challenge. Unfortunately, data anonymization approaches do not provide rigorous privacy guarantees. Although, there are existing synthetic data generation techniques that use rigorous definitions of differential privacy, to our knowledge, these techniques have not been compared extensively using different utility metrics. In this work, we provide two novel contributions. First, we compare existing techniques on different datasets using different utility metrics. Second, we present a novel approach that utilizes deep learning techniques coupled with an efficient analysis of privacy costs to generate differentially private synthetic datasets with higher data utility. We show that we can learn deep learning models that can capture relationship among multiple features, and then use these models to generate differentially private synthetic datasets. Our extensive experimental evaluation conducted on multiple datasets indicates that our proposed approach is more robust (i.e., one of the top performing technique in almost all type of data we have experimented) compared to the state-of-the art methods in terms of various data utility measures. Code related to this paper is available at: https://github.com/ncabay/synthetic generation.

引用

页码：510 / 526

页数：17

共 37 条

[1] Deep Learning with Differential Privacy [J].

Abadi, Martin ;

Chu, Andy ;

Goodfellow, Ian ;

McMahan, H. Brendan ;

Mironov, Ilya ;

Talwar, Kunal ;

Zhang, Li .

CCS'16: PROCEEDINGS OF THE 2016 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, :308-318

[2]

[Anonymous], 2011, P 2011 ACM SIGMOD IN

[3]

Bache K., 2013, UCI machine learning repository

[4]

Barak Boaz, 2007, P 26 ACM SIGMOD SIGA, P273, DOI 10.1145/1265530.1265569

[5]

Beimel A, 2010, LECT NOTES COMPUT SC, V5978, P437, DOI 10.1007/978-3-642-11799-2_26

[6]

Bengio Y, 2013, INT CONF ACOUST SPEE, P8624, DOI 10.1109/ICASSP.2013.6639349

[7] Plausible Deniability for Privacy-Preserving Data Synthesis [J].

Bindschaedler, Vincent ;

Shokri, Reza ;

Gunter, Carl A. .

PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (05) :481-492

[8]

Blum Avrim, 2005, P 24 ACM SIGMOD SIGA, P128, DOI [DOI 10.1145/1065167.1065184, 10.1145/1065167.1065184]

[9]

Chaudhuri K., 2009, P ADV NEUR INF PROC, P289

[10]

Chaudhuri K, 2011, J MACH LEARN RES, V12, P1069

← 1 2 3 4 →