Bounds on the sample complexity for private learning and private data release

被引:58
作者
Beimel, Amos [1 ]
Brenner, Hai [2 ]
Kasiviswanathan, Shiva Prasad [3 ]
Nissim, Kobbi [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Comp Sci, IL-84105 Beer Sheva, Israel
[2] Ben Gurion Univ Negev, Dept Math, IL-84105 Beer Sheva, Israel
[3] Gen Elect Res Ctr, San Ramon, CA USA
基金
以色列科学基金会;
关键词
Differential privacy; PAC learning; Sample complexity; Private data release; NOISE;
D O I
10.1007/s10994-013-5404-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning is a task that generalizes many of the analyses that are applied to collections of data, in particular, to collections of sensitive individual information. Hence, it is natural to ask what can be learned while preserving individual privacy. Kasiviswanathan et al. (in SIAM J. Comput., 40(3):793-826, 2011) initiated such a discussion. They formalized the notion of private learning, as a combination of PAC learning and differential privacy, and investigated what concept classes can be learned privately. Somewhat surprisingly, they showed that for finite, discrete domains (ignoring time complexity), every PAC learning task could be performed privately with polynomially many labeled examples; in many natural cases this could even be done in polynomial time. While these results seem to equate non-private and private learning, there is still a significant gap: the sample complexity of (non-private) PAC learning is crisply characterized in terms of the VC-dimension of the concept class, whereas this relationship is lost in the constructions of private learners, which exhibit, generally, a higher sample complexity. Looking into this gap, we examine several private learning tasks and give tight bounds on their sample complexity. In particular, we show strong separations between sample complexities of proper and improper private learners (such separation does not exist for non-private learners), and between sample complexities of efficient and inefficient proper private learners. Our results show that VC-dimension is not the right measure for characterizing the sample complexity of proper private learning. We also examine the task of private data release (as initiated by Blum et al. in STOC, pp. 609-618, 2008), and give new lower bounds on the sample complexity. Our results show that the logarithmic dependence on size of the instance space is essential for private data release.
引用
收藏
页码:401 / 437
页数:37
相关论文
共 27 条
  • [1] Beimel A, 2010, LECT NOTES COMPUT SC, V5978, P437, DOI 10.1007/978-3-642-11799-2_26
  • [2] Beimel Amos, 2013, ITCS, P97
  • [3] A Learning Theory Approach to Noninteractive Database Privacy
    Blum, Avrim
    Ligett, Katrina
    Roth, Aaron
    [J]. JOURNAL OF THE ACM, 2013, 60 (02)
  • [4] Blum A, 2008, ACM S THEORY COMPUT, P609
  • [5] Blum Avrim, 2005, P 24 ACM SIGMOD SIGA, P128, DOI [DOI 10.1145/1065167.1065184, 10.1145/1065167.1065184]
  • [6] LEARNABILITY AND THE VAPNIK-CHERVONENKIS DIMENSION
    BLUMER, A
    EHRENFEUCHT, A
    HAUSSLER, D
    WARMUTH, MK
    [J]. JOURNAL OF THE ACM, 1989, 36 (04) : 929 - 965
  • [7] Chaudhuri Kamalika, 2011, JMLR Workshop Conf Proc, V2011, P155
  • [8] Chaudhuri K, 2011, J MACH LEARN RES, V12, P1069
  • [9] A MEASURE OF ASYMPTOTIC EFFICIENCY FOR TESTS OF A HYPOTHESIS BASED ON THE SUM OF OBSERVATIONS
    CHERNOFF, H
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1952, 23 (04): : 493 - 507
  • [10] Calibrating noise to sensitivity in private data analysis
    Dwork, Cynthia
    McSherry, Frank
    Nissim, Kobbi
    Smith, Adam
    [J]. THEORY OF CRYPTOGRAPHY, PROCEEDINGS, 2006, 3876 : 265 - 284