Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models

被引:18
作者
Diakonikolas, Ilias [1 ]
Kane, Daniel M. [2 ,3 ]
机构
[1] Univ Wisconsin, Dept Comp Sci, 1210 W Dayton St, Madison, WI 53706 USA
[2] Univ Calif San Diego, Dept CSE, La Jolla, CA 92093 USA
[3] Univ Calif San Diego, Dept Math, La Jolla, CA 92093 USA
来源
2020 IEEE 61ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2020) | 2020年
关键词
component; machine learning; style; styling; MIXTURES; GAUSSIANS;
D O I
10.1109/FOCS46700.2020.00026
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Let V be any vector space of multivariate degreed homogeneous polynomials with co-dimension at most k, and S be the set of points where all polynomials in V nearly vanish. We establish a qualitatively optimal upper bound on the size of epsilon-covers for S, in the l(2)-norm. Roughly speaking, we show that there exists an epsilon-cover for S of cardinality M = (k/epsilon)(O)((k1/d))(d). Our result is constructive yielding an algorithm to compute such an epsilon-cover that runs in time poly(M). Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models with hidden variables. These include density and parameter estimation for k-mixtures of spherical Gaussians (with known common covariance), PAC learning one-hidden-layer ReLU networks with k hidden units (under the Gaussian distribution), density and parameter estimation for k-mixtures of linear regressions (with Gaussian covariates), and parameter estimation for k-mixtures of hyperplanes. Our algorithms run in time quasi-polynomial in the parameter k. Previous algorithms for these problems had running times exponential in k(Omega(1)). At a high-level our algorithms for all these learning problems work as follows: By computing the low-degree moments of the hidden parameters, we are able to find a vector space of polynomials that nearly vanish on the unknown parameters. Our structural result allows us to compute a quasi-polynomial sized cover for the set of hidden parameters, which we exploit in our learning algorithms.
引用
收藏
页码:184 / 195
页数:12
相关论文
共 49 条
[21]   PAC learning axis-aligned mixtures of Gaussians with no separation assumption [J].
Feldman, Jon ;
Servedio, Rocco A. ;
O'Donnell, Ryan .
LEARNING THEORY, PROCEEDINGS, 2006, 4005 :20-34
[22]  
Ge R., 2019, 7 INT C LEARN REPR I
[23]  
Ge R., 2018, 6 INT C LEARN REPR I
[24]   Learning Mixtures of Gaussians in High Dimensions [J].
Ge, Rong ;
Huang, Qingqing ;
Kakade, Sham M. .
STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, :761-770
[25]  
Goel S., 2017, P MACHINE LEARNING R, P1004
[26]  
Goel Surbhi, 2019, P MACHINE LEARNING R, V99
[27]   Tight Bounds for Learning a Mixture of Two Gaussians [J].
Hardt, Moritz ;
Price, Eric .
STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, :753-760
[28]   Mixture Models, Robustness, and Sum of Squares Proofs [J].
Hopkins, Samuel B. ;
Li, Jerry .
STOC'18: PROCEEDINGS OF THE 50TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2018, :1021-1034
[29]  
Hsu Daniel, 2013, ITCS 13 P 2013 ACM C, P11
[30]  
Janzamin M., 2015, Beating the perils of non-convexity: Guaranteed training of neural networks using tensor methods