The complexity of approximating the entropy

被引:56
作者
Batu, T [1 ]
Dasgupta, S
Kumar, R
Rubinfeld, R
机构
[1] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[2] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
[3] IBM Corp, Almaden Res Ctr, San Jose, CA 95120 USA
[4] MIT, CSAIL, Cambridge, MA 02139 USA
关键词
entropy estimation; sublinear algorithms; properties of distributions; property testing;
D O I
10.1137/S0097539702403645
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear time in the size of the domain is both necessary and sufficient for approximating the entropy. In the generation oracle model, the algorithm has access only to independent samples from the distribution. In this case, we show that a gamma-multiplicative approximation to the entropy can be obtained in O(n((1+eta)/gamma 2) log n) time for distributions with entropy Omega(gamma/eta), where n is the size of the domain of the distribution and. is an arbitrarily small positive constant. We show that this model does not permit a multiplicative approximation to the entropy in general. For the class of distributions to which our upper bound applies, we obtain a lower bound of Omega(n(1)/((2 gamma 2))). We next consider a combined oracle model in which the algorithm has access to both the generation and the evaluation oracles of the distribution. In this model, significantly greater efficiency can be achieved: we present an algorithm for gamma-multiplicative approximation to the entropy that runs in O((gamma(2) log(2) n)/(h(2)(gamma - 1)(2))) time for distributions with entropy Omega(h); for such distributions, we also show a lower bound of Omega((log n)/(h(gamma(2) - 1) +gamma(2))). Finally, we consider two special families of distributions: those in which the probabilities of the elements decrease monotonically with respect to a known ordering of the domain, and those that are uniform over a subset of the domain. In each case, we give more efficient algorithms for approximating the entropy.
引用
收藏
页码:132 / 150
页数:19
相关论文
共 16 条
  • [1] Antos A., 2001, Proceedings. 2001 IEEE International Symposium on Information Theory (IEEE Cat. No.01CH37252), DOI 10.1109/ISIT.2001.935908
  • [2] Testing random variables for independence and identity
    Batu, T
    Fischer, E
    Fortnow, L
    Kumar, R
    Rubinfeld, R
    White, P
    [J]. 42ND ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2001, : 442 - 451
  • [3] Testing that distributions are close
    Batu, T
    Fortnow, L
    Rubinfeld, R
    Smith, WD
    White, P
    [J]. 41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, : 259 - 269
  • [4] Universal entropy estimation via block sorting
    Cai, HX
    Kulkarni, SR
    Verdú, S
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2004, 50 (07) : 1551 - 1561
  • [5] Comparing entropies in statistical zero knowledge with applications to the structure of SZK
    Goldreich, O
    Vadhan, S
    [J]. FOURTEENTH ANNUAL IEEE CONFERENCE ON COMPUTATIONAL COMPLEXITY, PROCEEDINGS, 1999, : 54 - 73
  • [6] Goldreich O., 2000, LECT NOTES COMPUTER, V6650
  • [7] GRASSBERGER P, ENTROPY ESTIMATES IN
  • [8] HARRIS B, 1977, TOPICS INFORMATION T, V16, P323
  • [9] Kearns M., 1994, Proceedings of the Twenty-Sixth Annual ACM Symposium on the Theory of Computing, P273, DOI 10.1145/195058.195155
  • [10] CALCULATION OF ENTROPY FROM DATA OF MOTION
    MA, S
    [J]. JOURNAL OF STATISTICAL PHYSICS, 1981, 26 (02) : 221 - 240