SPARSE HIGH-DIMENSIONAL LINEAR REGRESSION. ESTIMATING SQUARED ERROR AND A PHASE TRANSITION

被引：6

作者：

Gamarnik, David ^{[1
]}

Zadik, Ilias ^{[1
]}

机构：

[1] MIT, Cambridge, MA 02139 USA

来源：

ANNALS OF STATISTICS | 2022年 / 50卷 / 02期

关键词：

High-dimensional linear regression; sparsity; overlap gap property; all-or-nothing phenomenon; second moment method; INFORMATION-THEORETIC LIMITS; SUPERPOSITION CODES; LOCAL ALGORITHMS; SIGNAL RECOVERY; VARIABLE SELECTION; SUPPORT RECOVERY; GENE-EXPRESSION; MODEL SELECTION; DENSE; SETS;

D O I：

10.1214/21-AOS2130

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

We consider a sparse high-dimensional regression model where the goal is to recover a k-sparse unknown binary vector beta* from n noisy linear observations of the form Y = X beta* + W is an element of R-n where X is an element of R-n has i.i.d. N(0, 1) entries and W is an element of R-n has i.i.d. N(0, sigma(2)) entries. In the high signal-to-noise ratio regime and sublinear sparsity regime, while the order of the sample size needed to recover the unknown vector information-theoretically is known to be n* := 2k log p/log (k/sigma(2) + 1), no polynomial-time algorithm is known to succeed unless n > n(alg) := (2k + sigma(2)) log p. In this work, we offer a series of results investigating multiple computational and statistical aspects of the recovery task in the regime n is an element of [n *, n(alg)]. First, we establish a novel information-theoretic property of the MLE of the problem happening around n = n* samples, which we coin as an "all-ornothing behavior": when n > n* it recovers almost perfectly the support of beta*, while if n < n* it fails to recover any fraction of it correctly. Second, at an attempt to understand the computational hardness in the regime n is an element of r we prove that at order n(al)g samples there is an Overlap Gap is an element of [n*, n(alg)], Property (OGP) phase transition occurring at the landscape of the MLE: for constants c, C > 0 when n < cn(alg) OGP appears in the landscape of MLE while if n > Cn(alg) OGP disappears. OGP is a geometric "disconnectivity" property, which initially appeared in the theory of spin glasses and is known to suggest algorithmic hardness when it occurs. Finally, using certain technical results obtained to establish the OGP phase transition, we additionally establish various novel positive and negative algorithmic results for the recovery task of interest, including the failure of LASSO with access to n < cn(alg) samples and the success of a simple local search method with access to n > Cn(alg) samples.

引用

页码：880 / 903

页数：24

共 50 条

[1] Estimating the error variance in a high-dimensional linear model
Yu, Guo
Bien, Jacob
BIOMETRIKA, 2019, 106 (03) : 533 - 546
[2] SPReM: Sparse Projection Regression Model For High-Dimensional Linear Regression
Sun, Qiang
Zhu, Hongtu
Liu, Yufeng
Ibrahim, Joseph G.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (509) : 289 - 302
[3] Variational Bayes for High-Dimensional Linear Regression With Sparse Priors
Ray, Kolyan
Szabo, Botond
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (539) : 1270 - 1281
[4] Empirical Priors for Prediction in Sparse High-dimensional Linear Regression
Martin, Ryan
Tang, Yiqi
JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
[5] Error density estimation in high-dimensional sparse linear model
Zou, Feng
Cui, Hengjian
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2020, 72 (02) : 427 - 449
[6] Empirical priors for prediction in sparse high-dimensional linear regression
Martin, Ryan
Tang, Yiqi
Journal of Machine Learning Research, 2020, 21
[7] Error density estimation in high-dimensional sparse linear model
Feng Zou
Hengjian Cui
Annals of the Institute of Statistical Mathematics, 2020, 72 : 427 - 449
[8] Sparse High-Dimensional Isotonic Regression
Gamarnik, David
Gaudio, Julia
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[9] Variable selection in high-dimensional sparse multiresponse linear regression models
Luo, Shan
STATISTICAL PAPERS, 2020, 61 (03) : 1245 - 1267
[10] NEARLY OPTIMAL MINIMAX ESTIMATOR FOR HIGH-DIMENSIONAL SPARSE LINEAR REGRESSION
Zhang, Li
ANNALS OF STATISTICS, 2013, 41 (04): : 2149 - 2175

← 1 2 3 4 5 →