Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models

被引：18

作者：

Diakonikolas, Ilias ^{[1
]}

Kane, Daniel M. ^{[2
,3
]}

机构：

[1] Univ Wisconsin, Dept Comp Sci, 1210 W Dayton St, Madison, WI 53706 USA

[2] Univ Calif San Diego, Dept CSE, La Jolla, CA 92093 USA

[3] Univ Calif San Diego, Dept Math, La Jolla, CA 92093 USA

来源：

2020 IEEE 61ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2020) | 2020年

关键词：

component; machine learning; style; styling; MIXTURES; GAUSSIANS;

D O I：

10.1109/FOCS46700.2020.00026

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Let V be any vector space of multivariate degreed homogeneous polynomials with co-dimension at most k, and S be the set of points where all polynomials in V nearly vanish. We establish a qualitatively optimal upper bound on the size of epsilon-covers for S, in the l(2)-norm. Roughly speaking, we show that there exists an epsilon-cover for S of cardinality M = (k/epsilon)(O)((k1/d))(d). Our result is constructive yielding an algorithm to compute such an epsilon-cover that runs in time poly(M). Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models with hidden variables. These include density and parameter estimation for k-mixtures of spherical Gaussians (with known common covariance), PAC learning one-hidden-layer ReLU networks with k hidden units (under the Gaussian distribution), density and parameter estimation for k-mixtures of linear regressions (with Gaussian covariates), and parameter estimation for k-mixtures of hyperplanes. Our algorithms run in time quasi-polynomial in the parameter k. Previous algorithms for these problems had running times exponential in k(Omega(1)). At a high-level our algorithms for all these learning problems work as follows: By computing the low-degree moments of the hidden parameters, we are able to find a vector space of polynomials that nearly vanish on the unknown parameters. Our structural result allows us to compute a quasi-polynomial sized cover for the set of hidden parameters, which we exploit in our learning algorithms.

引用

页码：184 / 195

页数：12

共 49 条

[1]

Acharya J, 2014, ADV NEUR IN, V27

[2]

Acharya J, 2017, PROCEEDINGS OF THE TWENTY-EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1278

[3] On spectral learning of mixtures of distributions [J].

Achlioptas, D ;

McSherry, R .

LEARNING THEORY, PROCEEDINGS, 2005, 3559 :458-469

[4]

Anderson Joseph, 2014, P 27 C LEARN THEOR, P1135

[5]

Arora S., P 33 S THEOR COMP, P247

[6]

Ashtiani H, 2018, ADV NEUR IN, V31

[7]

Bhaskara A, 2015, JMLR WORKSH CONF PRO, V38, P83

[8] Smoothed Analysis of Tensor Decompositions [J].

Bhaskara, Aditya ;

Charikar, Moses ;

Moitra, Ankur ;

Vijayaraghavan, Aravindan .

STOC'14: PROCEEDINGS OF THE 46TH ANNUAL 2014 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2014, :594-603

[9] Isotropic PCA and Affine-Invariant Clustering [J].

Brubaker, S. Charles ;

Vempala, Santosh S. .

PROCEEDINGS OF THE 49TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2008, :551-560

[10] Efficient Density Estimation via Piecewise Polynomial Approximation [J].

Chan, Siu-On ;

Diakonikolas, Ilias ;

Servedio, Rocco A. ;

Sun, Xiaorui .

STOC'14: PROCEEDINGS OF THE 46TH ANNUAL 2014 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2014, :604-613

← 1 2 3 4 5 →