THE GEOMETRY OF DIFFERENTIAL PRIVACY: THE SMALL DATABASE AND APPROXIMATE CASES

被引:18
作者
Nikolov, Aleksandar [1 ]
Talwar, Kunal [2 ]
Zhang, Li [2 ]
机构
[1] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08854 USA
[2] Microsoft Res, Mountain View, CA 94043 USA
关键词
differential privacy; statistical estimation; convex geometry; combinatorial discrepancy; ALGORITHMS; POLYTOPES; MECHANISM;
D O I
10.1137/130938943
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries and has been a focus of a long line of work. For a given set of d linear queries A over a database x is an element of R-N, we seek to find the differentially private mechanism that has the minimum mean squared error relative to the true answer Ax. For pure differential privacy, Hart and Talwar and Bhaskara et al. give an O(log(2) d) approximation to the optimal mechanism. Our first contribution is to give an O(log(2) d) approximation guarantee for the case of (epsilon, delta)-approximate differential privacy. Our mechanism adds carefully chosen correlated Gaussian noise to the answers and runs in time polynomial in d and N. The mechanism is based on a recursive construction of a "small" enclosing ellipsoid of the sensitivity polytope AB(1)(N), where B-1(N) denotes the N-dimensional l(1) ball. Optimality is established relative to the spectral lower bound, through a variant of the classical Bourgain-Tzafriri restricted invertibility theorem. We next consider this question in the case when the number of individuals n = parallel to x parallel to(1) in the database is smaller than the number of queries d. We call this the small database setting. Since the work of Blum, Ligett, and Roth [STOC '08: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, 2008, pp. 609-618.] it has been known that such a restriction can allow for mechanisms with lower error. Our second main contribution is to give, for both pure and (epsilon, delta)-privacy, differentially private mechanisms which have mean squared error within polylog(d, N) of the optimal for any A and n. The approximation is achieved by first applying the mechanism without considering the size of the database and then projecting the answer to the convex hull of feasible answers space nAB(1)(N), which is guaranteed to contain the true answer Ax. By applying techniques from statistical estimation, we show such mechanisms can achieve the claimed approximation. When applied to counting queries, i.e., A with entries in {0, 1}, our mechanism achieves (O) over tilde(root n) average error for both pure and (epsilon, delta)-privacy. (O) over tilde hides polylogarithmic factors in all the parameters such as d, n, N, and 1/delta. Previously, such a bound was only known for (epsilon, delta)-privacy. For pure privacy, this bound improves upon the (O) over tilde (n 2/3) bound of Blum, Ligett, and Roth and matches the lower bound implied by I. Dinur and K. Nissim, Proceedings of the 22nd PODS, ACM, 2003, pp. 202-210 up to logarithmic factors. The third question we consider is the accuracy gap between the pure and approximate privacy notions. By applying tools from convex geometry, we make a connection between the hereditary discrepancy lower bound of S. Muthukrishnan and A. Nikolov, STOC '12: Proceedings of the 44th Symposium on Theory of Computing, 2012, pp. 1285-1292 for (epsilon, delta)-privacy and the volume lower bound of Hardt and Talwar for pure privacy. We are able to show that the gap in error between the optimal mechanisms satisfying pure and (epsilon, delta)-differential privacy is bounded by a factor of O(polylog(d, N)). The connection between hereditary discrepancy and private mechanisms also enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix A.
引用
收藏
页码:575 / 616
页数:42
相关论文
共 66 条
[1]   A RANDOMIZED SCHEME FOR SPEEDING-UP ALGORITHMS FOR LINEAR AND CONVEX-PROGRAMMING PROBLEMS WITH HIGH CONSTRAINTS-TO-VARIABLES RATIO [J].
ADLER, I ;
SHAMIR, R .
MATHEMATICAL PROGRAMMING, 1993, 61 (01) :39-52
[2]  
[Anonymous], 2011, P 2011 ACM SIGMOD IN
[3]  
[Anonymous], 2003, P 22 ACM SIGMOD SIGA
[4]  
Austrin P., 2013, ECCC
[5]   Constructive Algorithms for Discrepancy Minimization [J].
Bansal, Nikhil .
2010 IEEE 51ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2010, :3-10
[6]  
Barak B., 2007, P 26 ACM SIGMOD SIGA, P273, DOI DOI 10.1145/1265530.1265569
[7]   APPROXIMATION OF THE SPHERE BY POLYTOPES HAVING FEW VERTICES [J].
BARANY, I ;
FUREDI, Z .
PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY, 1988, 102 (03) :651-659
[8]  
BECK J, 1995, HDB COMBINATORICS, V2, P1405
[9]  
Bhaskara A, 2012, STOC'12: PROCEEDINGS OF THE 2012 ACM SYMPOSIUM ON THEORY OF COMPUTING, P1269
[10]  
Blum A, 2008, ACM S THEORY COMPUT, P609