Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models

被引：18

作者：

Diakonikolas, Ilias ^{[1
]}

Kane, Daniel M. ^{[2
,3
]}

机构：

[1] Univ Wisconsin, Dept Comp Sci, 1210 W Dayton St, Madison, WI 53706 USA

[2] Univ Calif San Diego, Dept CSE, La Jolla, CA 92093 USA

[3] Univ Calif San Diego, Dept Math, La Jolla, CA 92093 USA

来源：

2020 IEEE 61ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2020) | 2020年

关键词：

component; machine learning; style; styling; MIXTURES; GAUSSIANS;

D O I：

10.1109/FOCS46700.2020.00026

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Let V be any vector space of multivariate degreed homogeneous polynomials with co-dimension at most k, and S be the set of points where all polynomials in V nearly vanish. We establish a qualitatively optimal upper bound on the size of epsilon-covers for S, in the l(2)-norm. Roughly speaking, we show that there exists an epsilon-cover for S of cardinality M = (k/epsilon)(O)((k1/d))(d). Our result is constructive yielding an algorithm to compute such an epsilon-cover that runs in time poly(M). Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models with hidden variables. These include density and parameter estimation for k-mixtures of spherical Gaussians (with known common covariance), PAC learning one-hidden-layer ReLU networks with k hidden units (under the Gaussian distribution), density and parameter estimation for k-mixtures of linear regressions (with Gaussian covariates), and parameter estimation for k-mixtures of hyperplanes. Our algorithms run in time quasi-polynomial in the parameter k. Previous algorithms for these problems had running times exponential in k(Omega(1)). At a high-level our algorithms for all these learning problems work as follows: By computing the low-degree moments of the hidden parameters, we are able to find a vector space of polynomials that nearly vanish on the unknown parameters. Our structural result allows us to compute a quasi-polynomial sized cover for the set of hidden parameters, which we exploit in our learning algorithms.

引用

页码：184 / 195

页数：12

共 49 条

[21] PAC learning axis-aligned mixtures of Gaussians with no separation assumption [J].

Feldman, Jon ;

Servedio, Rocco A. ;

O'Donnell, Ryan .

LEARNING THEORY, PROCEEDINGS, 2006, 4005 :20-34

[22]

Ge R., 2019, 7 INT C LEARN REPR I

[23]

Ge R., 2018, 6 INT C LEARN REPR I

[24] Learning Mixtures of Gaussians in High Dimensions [J].

Ge, Rong ;

Huang, Qingqing ;

Kakade, Sham M. .

STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, :761-770

[25]

Goel S., 2017, P MACHINE LEARNING R, P1004

[26]

Goel Surbhi, 2019, P MACHINE LEARNING R, V99

[27] Tight Bounds for Learning a Mixture of Two Gaussians [J].

Hardt, Moritz ;

Price, Eric .

STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, :753-760

[28] Mixture Models, Robustness, and Sum of Squares Proofs [J].

Hopkins, Samuel B. ;

Li, Jerry .

STOC'18: PROCEEDINGS OF THE 50TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2018, :1021-1034

[29]

Hsu Daniel, 2013, ITCS 13 P 2013 ACM C, P11

[30]

Janzamin M., 2015, Beating the perils of non-convexity: Guaranteed training of neural networks using tensor methods

← 1 2 3 4 5 →