GENERALIZED RESILIENCE AND ROBUST STATISTICS

被引:7
作者
Zhu, Banghua [1 ]
Jiao, Jiantao [1 ]
Steinhardt, Jacob [2 ]
机构
[1] Univ Calif Berkeley, Dept EECS, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
Robust statistics; minimum distance functional; total variation distance perturbation; Wasserstein distance perturbation; REGRESSION;
D O I
10.1214/22-AOS2186
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Robust statistics traditionally focuses on outliers, or perturbations in total variation distance. However, a dataset could be maliciously corrupted in many other ways, such as systematic measurement errors and missing covariates. We consider corruption in either TV or Wasserstein distance, and show that robust estimation is possible whenever the true population distribution satisfies a property called generalized resilience, which holds under moment or hypercontractive conditions. For TV corruption model, our finite-sample analysis improves over previous results for mean estimation with bounded kth moment, linear regression, and joint mean and covariance estimation. For W-1 corruption, we provide the first finite-sample guarantees for second moment estimation and linear regression. Technically, our robust estimators are a generalization of minimum distance (MD) functionals, which project the corrupted distribution onto a given set of well-behaved distributions. The error of these MD functionals is bounded by a certain modulus of continuity, and we provide a systematic method for upper bounding this modulus for the class of generalized resilient distributions, which usually gives sharp population-level results and good finite-sample guarantees.
引用
收藏
页码:2256 / 2283
页数:28
相关论文
共 46 条
[1]  
Adrover J, 2002, ANN STAT, V30, P1760
[2]  
[Anonymous], 2011, ROBUST STAT
[3]  
Hopkins SB, 2019, Arxiv, DOI arXiv:1903.07870
[4]   Robust Linear Regression: Optimal Rates in Polynomial Time [J].
Bakshi, Ainesh ;
Prasad, Adarsh .
STOC '21: PROCEEDINGS OF THE 53RD ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2021, :102-115
[5]  
Ben-Tal Z., 2009, ROBUST OPTIMIZATION, DOI DOI 10.1515/9781400831050
[6]  
BORWEIN J., 2005, Convex Analysis and Nonlinear Optimization: Theory and Examples
[7]  
Boucheron S., 2013, Concentration inequalities: A nonasymptotic theory of independence, DOI 10.1093/acprof:oso/9780199535255.001.0001
[8]   ROBUST COVARIANCE AND SCATTER MATRIX ESTIMATION UNDER HUBER'S CONTAMINATION MODEL [J].
Chen, Mengjie ;
Gao, Chao ;
Ren, Zhao .
ANNALS OF STATISTICS, 2018, 46 (05) :1932-1960
[9]   A general decision theory for Huber's ε-contamination model [J].
Chen, Mengjie ;
Gao, Chao ;
Ren, Zhao .
ELECTRONIC JOURNAL OF STATISTICS, 2016, 10 (02) :3752-3774
[10]  
Chen ZQ, 2002, ANN STAT, V30, P1737