SPARSE HIGH-DIMENSIONAL LINEAR REGRESSION. ESTIMATING SQUARED ERROR AND A PHASE TRANSITION

被引:6
|
作者
Gamarnik, David [1 ]
Zadik, Ilias [1 ]
机构
[1] MIT, Cambridge, MA 02139 USA
来源
ANNALS OF STATISTICS | 2022年 / 50卷 / 02期
关键词
High-dimensional linear regression; sparsity; overlap gap property; all-or-nothing phenomenon; second moment method; INFORMATION-THEORETIC LIMITS; SUPERPOSITION CODES; LOCAL ALGORITHMS; SIGNAL RECOVERY; VARIABLE SELECTION; SUPPORT RECOVERY; GENE-EXPRESSION; MODEL SELECTION; DENSE; SETS;
D O I
10.1214/21-AOS2130
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider a sparse high-dimensional regression model where the goal is to recover a k-sparse unknown binary vector beta* from n noisy linear observations of the form Y = X beta* + W is an element of R-n where X is an element of R-n has i.i.d. N(0, 1) entries and W is an element of R-n has i.i.d. N(0, sigma(2)) entries. In the high signal-to-noise ratio regime and sublinear sparsity regime, while the order of the sample size needed to recover the unknown vector information-theoretically is known to be n* := 2k log p/log (k/sigma(2) + 1), no polynomial-time algorithm is known to succeed unless n > n(alg) := (2k + sigma(2)) log p. In this work, we offer a series of results investigating multiple computational and statistical aspects of the recovery task in the regime n is an element of [n *, n(alg)]. First, we establish a novel information-theoretic property of the MLE of the problem happening around n = n* samples, which we coin as an "all-ornothing behavior": when n > n* it recovers almost perfectly the support of beta*, while if n < n* it fails to recover any fraction of it correctly. Second, at an attempt to understand the computational hardness in the regime n is an element of r we prove that at order n(al)g samples there is an Overlap Gap is an element of [n*, n(alg)], Property (OGP) phase transition occurring at the landscape of the MLE: for constants c, C > 0 when n < cn(alg) OGP appears in the landscape of MLE while if n > Cn(alg) OGP disappears. OGP is a geometric "disconnectivity" property, which initially appeared in the theory of spin glasses and is known to suggest algorithmic hardness when it occurs. Finally, using certain technical results obtained to establish the OGP phase transition, we additionally establish various novel positive and negative algorithmic results for the recovery task of interest, including the failure of LASSO with access to n < cn(alg) samples and the success of a simple local search method with access to n > Cn(alg) samples.
引用
收藏
页码:880 / 903
页数:24
相关论文
共 50 条
  • [1] Estimating the error variance in a high-dimensional linear model
    Yu, Guo
    Bien, Jacob
    BIOMETRIKA, 2019, 106 (03) : 533 - 546
  • [2] SPReM: Sparse Projection Regression Model For High-Dimensional Linear Regression
    Sun, Qiang
    Zhu, Hongtu
    Liu, Yufeng
    Ibrahim, Joseph G.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (509) : 289 - 302
  • [3] Variational Bayes for High-Dimensional Linear Regression With Sparse Priors
    Ray, Kolyan
    Szabo, Botond
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (539) : 1270 - 1281
  • [4] Empirical Priors for Prediction in Sparse High-dimensional Linear Regression
    Martin, Ryan
    Tang, Yiqi
    JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [5] Error density estimation in high-dimensional sparse linear model
    Zou, Feng
    Cui, Hengjian
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2020, 72 (02) : 427 - 449
  • [6] Empirical priors for prediction in sparse high-dimensional linear regression
    Martin, Ryan
    Tang, Yiqi
    Journal of Machine Learning Research, 2020, 21
  • [7] Error density estimation in high-dimensional sparse linear model
    Feng Zou
    Hengjian Cui
    Annals of the Institute of Statistical Mathematics, 2020, 72 : 427 - 449
  • [8] Sparse High-Dimensional Isotonic Regression
    Gamarnik, David
    Gaudio, Julia
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [9] Variable selection in high-dimensional sparse multiresponse linear regression models
    Luo, Shan
    STATISTICAL PAPERS, 2020, 61 (03) : 1245 - 1267
  • [10] NEARLY OPTIMAL MINIMAX ESTIMATOR FOR HIGH-DIMENSIONAL SPARSE LINEAR REGRESSION
    Zhang, Li
    ANNALS OF STATISTICS, 2013, 41 (04): : 2149 - 2175