WHAT CAN WE LEARN PRIVATELY?

被引:363
作者
Kasiviswanathan, Shiva Prasad [1 ]
Lee, Homin K. [2 ]
Nissim, Kobbi [3 ]
Raskhodnikova, Sofya [4 ]
Smith, Adam [4 ]
机构
[1] Los Alamos Natl Lab, CCS 3, Los Alamos, NM 87545 USA
[2] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[3] Ben Gurion Univ Negev, Dept Comp Sci, IL-84105 Beer Sheva, Israel
[4] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
基金
美国国家科学基金会; 以色列科学基金会;
关键词
data privacy; probabilistically approximately correct learning; statistical query learning; differential privacy; STATISTICAL DISCLOSURE CONTROL; RANDOMIZED-RESPONSE; BOUNDS; NOISE; STABILITY; PRIVACY;
D O I
10.1137/090756090
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask, What concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in contexts where aggregate information is released about a database containing sensitive information about individuals. Our goal is a broad understanding of the resources required for private learning in terms of samples, computation time, and interaction. We demonstrate that, ignoring computational constraints, it is possible to privately agnostically learn any concept class using a sample size approximately logarithmic in the cardinality of the concept class. Therefore, almost anything learnable is learnable privately: specifically, if a concept class is learnable by a (nonprivate) algorithm with polynomial sample complexity and output size, then it can be learned privately using a polynomial number of samples. We also present a computationally efficient private probabilistically approximately correct learner for the class of parity functions. This result dispels the similarity between learning with noise and private learning (both must be robust to small changes in inputs), since parity is thought to be very hard to learn given random classification noise. Local (or randomized response) algorithms are a practical class of private algorithms that have received extensive investigation. We provide a precise characterization of local private learning algorithms. We show that a concept class is learnable by a local algorithm if and only if it is learnable in the statistical query (SQ) model. Therefore, for local private learning algorithms, the similarity to learning with noise is stronger: local learning is equivalent to SQ learning, and SQ algorithms include most known noise-tolerant learning algorithms. Finally, we present a separation between the power of interactive and noninteractive local learning algorithms. Because of the equivalence to SQ learning, this result also separates adaptive and nonadaptive SQ learning.
引用
收藏
页码:793 / 826
页数:34
相关论文
共 57 条
[1]  
Agrawal D., 2001, PROC 20 ACM SIGMOD S, P247, DOI [10.1145/375551.375602, DOI 10.1145/375551.375602]
[2]  
Agrawal R, 2000, SIGMOD REC, V29, P439, DOI 10.1145/335191.335438
[3]  
Agrawal S, 2005, PROC INT CONF DATA, P193
[4]   More on average case vs approximation complexity [J].
Alekhnovich, M .
44TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2003, :298-307
[5]  
Ambainis A, 2004, LECT NOTES COMPUT SC, V2947, P425
[6]   FAST PROBABILISTIC ALGORITHMS FOR HAMILTONIAN CIRCUITS AND MATCHINGS [J].
ANGLUIN, D ;
VALIANT, LG .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1979, 18 (02) :155-193
[7]  
[Anonymous], 2002, P 18 C UNC ART INT U
[8]  
[Anonymous], 2001, LECT NOTES COMPUTER
[9]  
[Anonymous], 2003, P 22 ACM SIGMOD SIGA
[10]  
Barak B., 2007, P 26 ACM SIGMOD SIGA, P273, DOI DOI 10.1145/1265530.1265569