Differentially private nearest neighbor classification

被引:0
作者
Mehmet Emre Gursoy
Ali Inan
Mehmet Ercan Nergiz
Yucel Saygin
机构
[1] Georgia Institute of Technology,College of Computing
[2] Adana Science and Technology University,Computer Engineering Department
[3] Acadsoft Research,Faculty of Engineering and Natural Sciences
[4] Sabanci University,undefined
来源
Data Mining and Knowledge Discovery | 2017年 / 31卷
关键词
Data mining; Differential privacy; -Nearest neighbors;
D O I
暂无
中图分类号
学科分类号
摘要
Instance-based learning, and the k-nearest neighbors algorithm (k-NN) in particular, provide simple yet effective classification algorithms for data mining. Classifiers are often executed on sensitive information such as medical or personal data. Differential privacy has recently emerged as the accepted standard for privacy protection in sensitive data. However, straightforward applications of differential privacy to k-NN classification yield rather inaccurate results. Motivated by this, we develop algorithms to increase the accuracy of private instance-based classification. We first describe the radius neighbors classifier (r-N) and show that its accuracy under differential privacy can be greatly improved by a non-trivial sensitivity analysis. Then, for k-NN classification, we build algorithms that convert k-NN classifiers to r-N classifiers. We experimentally evaluate the accuracy of both classifiers using various datasets. Experiments show that our proposed classifiers significantly outperform baseline private classifiers (i.e., straightforward applications of differential privacy) and executing the classifiers on a dataset published using differential privacy. In addition, the accuracy of our proposed k-NN classifiers are at least comparable to, and in many cases better than, the other differentially private machine learning techniques.
引用
收藏
页码:1544 / 1575
页数:31
相关论文
共 58 条
[1]  
Alcala J(2010)KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework J Mult. Valued Log. Soft Comput. 17 255-287
[2]  
Fernandez A(2013)A near-optimal algorithm for differentially-private principal components J Mach Learn Res 14 2905-2943
[3]  
Luengo J(2007)Unsupervised learning with normalised data and non-Euclidean norms Appl Soft Comput 7 203-210
[4]  
Derrac J(2008)On the difficulties of disclosure prevention in statistical databases or the case for differential privacy J Priv Confidentiality 2 8-1693
[5]  
Garcia S(2012)Universally utility-maximizing privacy mechanisms SIAM J Comput 41 1673-352
[6]  
Sanchez L(2014)A data-and workload-aware algorithm for range queries under differential privacy Proc VLDB Endow 7 341-450
[7]  
Herrera F(2011)Personalized social recommendations: accurate or private? Proc VLDB Endow 4 440-309
[8]  
Chaudhuri K(2010)k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction Pharmacogenomics J 10 292-100
[9]  
Sarwate AD(2012)Learning in a large function space: privacy-preserving mechanisms for SVM learning J Priv Confidentiality 4 65-94
[10]  
Sinha K(2013)Signal processing and machine learning with differential privacy: algorithms and challenges for continuous data IEEE Signal Process Mag 30 86-930