Differentially Private K-Means Clustering

被引:86
作者
Su, Dong [1 ]
Cao, Jianneng [2 ]
Li, Ninghui [1 ]
Bertino, Elisa [1 ]
Jin, Hongxia [3 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Inst Infocomm Res, Singapore, Singapore
[3] Samsung Res Amer, Mountain View, CA USA
来源
CODASPY'16: PROCEEDINGS OF THE SIXTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY | 2016年
基金
美国国家科学基金会;
关键词
Differential privacy; k-means clustering; Private data publishing;
D O I
10.1145/2857705.2857708
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There are two broad approaches for differentially private data analysis. The interactive approach aims at developing customized differentially private algorithms for various data mining tasks. The non-interactive approach aims at developing differentially private algorithms that can output a synopsis of the input dataset, which can then be used to support various data mining tasks. In this paper we study the effectiveness of the two approaches on differentially private k-means clustering. We develop techniques to analyze the empirical error behaviors of the existing interactive and non-interactive approaches. Based on the analysis, we propose an improvement of DPLloyd which is a differentially private version of the Lloyd algorithm. We also propose a non-interactive approach EUGkM which publishes a differentially private synopsis for k-means clustering. Results from extensive and systematic experiments support our analysis and demonstrate the effectiveness of our improvement on DPLloyd and the proposed EUGkM algorithm.
引用
收藏
页码:26 / 37
页数:12
相关论文
共 33 条
[1]  
[Anonymous], 2011, Advances in Neural Information Processing Systems
[2]  
[Anonymous], 2010, UCI machine learning repository
[3]  
[Anonymous], 2003, P 22 ACM SIGMOD SIGA
[4]  
Bhaskar R., 2010, KDD, P503, DOI DOI 10.1145/1835804.1835869
[5]  
Blum A., 2005, PODS'05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, P128, DOI DOI 10.1145/1065167.1065184
[6]  
Chaudhuri K., 2008, ADV NEURAL INFORM PR, V21
[7]   Differentially Private Spatial Decompositions [J].
Cormode, Graham ;
Procopiuc, Cecilia ;
Srivastava, Divesh ;
Shen, Entong ;
Yu, Ting .
2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, :20-31
[8]  
Dong CL, 2013, INT CONF MACH LEARN, P664, DOI 10.1109/ICMLC.2013.6890373
[9]  
Dwork C, 2004, LECT NOTES COMPUT SC, V3152, P528
[10]   Calibrating noise to sensitivity in private data analysis [J].
Dwork, Cynthia ;
McSherry, Frank ;
Nissim, Kobbi ;
Smith, Adam .
THEORY OF CRYPTOGRAPHY, PROCEEDINGS, 2006, 3876 :265-284