Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models

被引:18
作者
Diakonikolas, Ilias [1 ]
Kane, Daniel M. [2 ,3 ]
机构
[1] Univ Wisconsin, Dept Comp Sci, 1210 W Dayton St, Madison, WI 53706 USA
[2] Univ Calif San Diego, Dept CSE, La Jolla, CA 92093 USA
[3] Univ Calif San Diego, Dept Math, La Jolla, CA 92093 USA
来源
2020 IEEE 61ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2020) | 2020年
关键词
component; machine learning; style; styling; MIXTURES; GAUSSIANS;
D O I
10.1109/FOCS46700.2020.00026
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Let V be any vector space of multivariate degreed homogeneous polynomials with co-dimension at most k, and S be the set of points where all polynomials in V nearly vanish. We establish a qualitatively optimal upper bound on the size of epsilon-covers for S, in the l(2)-norm. Roughly speaking, we show that there exists an epsilon-cover for S of cardinality M = (k/epsilon)(O)((k1/d))(d). Our result is constructive yielding an algorithm to compute such an epsilon-cover that runs in time poly(M). Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models with hidden variables. These include density and parameter estimation for k-mixtures of spherical Gaussians (with known common covariance), PAC learning one-hidden-layer ReLU networks with k hidden units (under the Gaussian distribution), density and parameter estimation for k-mixtures of linear regressions (with Gaussian covariates), and parameter estimation for k-mixtures of hyperplanes. Our algorithms run in time quasi-polynomial in the parameter k. Previous algorithms for these problems had running times exponential in k(Omega(1)). At a high-level our algorithms for all these learning problems work as follows: By computing the low-degree moments of the hidden parameters, we are able to find a vector space of polynomials that nearly vanish on the unknown parameters. Our structural result allows us to compute a quasi-polynomial sized cover for the set of hidden parameters, which we exploit in our learning algorithms.
引用
收藏
页码:184 / 195
页数:12
相关论文
共 49 条
[1]  
Acharya J, 2014, ADV NEUR IN, V27
[2]  
Acharya J, 2017, PROCEEDINGS OF THE TWENTY-EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1278
[3]   On spectral learning of mixtures of distributions [J].
Achlioptas, D ;
McSherry, R .
LEARNING THEORY, PROCEEDINGS, 2005, 3559 :458-469
[4]  
Anderson Joseph, 2014, P 27 C LEARN THEOR, P1135
[5]  
Arora S., P 33 S THEOR COMP, P247
[6]  
Ashtiani H, 2018, ADV NEUR IN, V31
[7]  
Bhaskara A, 2015, JMLR WORKSH CONF PRO, V38, P83
[8]   Smoothed Analysis of Tensor Decompositions [J].
Bhaskara, Aditya ;
Charikar, Moses ;
Moitra, Ankur ;
Vijayaraghavan, Aravindan .
STOC'14: PROCEEDINGS OF THE 46TH ANNUAL 2014 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2014, :594-603
[9]   Isotropic PCA and Affine-Invariant Clustering [J].
Brubaker, S. Charles ;
Vempala, Santosh S. .
PROCEEDINGS OF THE 49TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2008, :551-560
[10]   Efficient Density Estimation via Piecewise Polynomial Approximation [J].
Chan, Siu-On ;
Diakonikolas, Ilias ;
Servedio, Rocco A. ;
Sun, Xiaorui .
STOC'14: PROCEEDINGS OF THE 46TH ANNUAL 2014 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2014, :604-613