Robust estimation via generalized quasi-gradients

被引：9

作者：

Zhu, Banghua ^{[1
]}

Jiao, Jiantao ^{[1
]}

Steinhardt, Jacob ^{[2
]}

机构：

[1] Univ Calif Berkeley, Dept EECS, Berkeley, CA 94720 USA

[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

来源：

INFORMATION AND INFERENCE-A JOURNAL OF THE IMA | 2022年 / 11卷 / 02期

基金：

美国国家科学基金会;

关键词：

robust statistics; generalized quasi-gradients; gradient descent; breakdown point; HIGH-DIMENSIONS; COVARIANCE; ASYMPTOTICS;

D O I：

10.1093/imaiai/iaab018

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

We explore why many recently proposed robust estimation problems are efficiently solvable, even though the underlying optimization problems are non-convex. We study the loss landscape of these robust estimation problems, and identify the existence of 'generalized quasi-gradients'. Whenever these quasi-gradients exist, a large family of no-regret algorithms are guaranteed to approximate the global minimum; this includes the commonly used filtering algorithm. For robust mean estimation of distributions under bounded covariance, we show that any first-order stationary point of the associated optimization problem is an approximate global minimum if and only if the corruption level epsilon < 1/3. Consequently, any optimization algorithm that approaches a stationary point yields an efficient robust estimator with breakdown point 1/3. With carefully designed initialization and step size, we improve this to 1/2, which is optimal. For other tasks, including linear regression and joint mean and covariance estimation, the loss landscape is more rugged: there are stationary points arbitrarily far from the global minimum. Nevertheless, we show that generalized quasi-gradients exist and construct efficient algorithms. These algorithms are simpler than previous ones in the literature, and for linear regression we improve the estimation error from O(root epsilon) to the optimal rate of O(epsilon) for small epsilon assuming certified hypercontractivity. For mean estimation with near-identity covariance, we show that a simple gradient descent algorithm achieves breakdown point 1/3 and iteration complexity <(O)over tilde>(d/epsilon(2)).

引用

页码：581 / 636

页数：56

共 50 条

[1]

Adam Klivans, 2018, C LEARNING THEORY, P1420

[2]

Adrover J, 2002, ANN STAT, V30, P1760

[3]

Arora S., 2012, Theory of Computing, V8, P121, DOI 10.4086/toc.2012.v008a006

[4] Robust Linear Regression: Optimal Rates in Polynomial Time [J].

Bakshi, Ainesh ;

Prasad, Adarsh .

STOC '21: PROCEEDINGS OF THE 53RD ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2021, :102-115

[5] MINIMUM HELLINGER DISTANCE ESTIMATES FOR PARAMETRIC MODELS [J].

BERAN, R .

ANNALS OF STATISTICS, 1977, 5 (03) :445-463

[6]

Bernholt T., 2006, T2005 U DORTM

[7] Optimality and Complexity for Constrained Optimization Problems with Nonconvex Regularization [J].

Bian, Wei ;

Chen, Xiaojun .

MATHEMATICS OF OPERATIONS RESEARCH, 2017, 42 (04) :1063-1084

[8]

Boyd S., 2003, LECT NOTES EE392O

[9] ASYMPTOTICS FOR THE MINIMUM COVARIANCE DETERMINANT ESTIMATOR [J].

BUTLER, RW ;

DAVIES, PL ;

JHUN, M .

ANNALS OF STATISTICS, 1993, 21 (03) :1385-1400

[10] ROBUST COVARIANCE AND SCATTER MATRIX ESTIMATION UNDER HUBER'S CONTAMINATION MODEL [J].

Chen, Mengjie ;

Gao, Chao ;

Ren, Zhao .

ANNALS OF STATISTICS, 2018, 46 (05) :1932-1960

← 1 2 3 4 5 →