UPS DELIVERS OPTIMAL PHASE DIAGRAM IN HIGH-DIMENSIONAL VARIABLE SELECTION

被引:46
|
作者
Ji, Pengsheng [1 ]
Jin, Jiashun [2 ]
机构
[1] Cornell Univ, Dept Stat Sci, Ithaca, NY 14853 USA
[2] Carnegie Mellon Univ, Dept Stat, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
Graph; Hamming distance; lasso; Stein's normal means; penalization methods; phase diagram; screen and clean; subset selection; variable selection; REGULARIZATION;
D O I
10.1214/11-AOS947
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Consider a linear model Y = X beta + z, z similar to N(0, I-n). Here, X = X-n,X-p, where both p and n are large, but p > n. We model the rows of X as lid. samples from N(0, 1/n Omega), where Omega is a p x p correlation matrix, which is unknown to us but is presumably sparse. The vector beta is also unknown but has relatively few nonzero coordinates, and we are interested in identifying these nonzeros. We propose the Univariate Penalization Screeing (UPS) for variable selection. This is a screen and clean method where we screen with univariate thresholding and clean with penalized MLE. It has two important properties: sure screening and separable after screening. These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately. The UPS is effective both in theory and in computation. We measure the performance of a procedure by the Hamming distance, and use an asymptotic framework where p -> infinity and other quantities (e.g., n, sparsity level and strength of signals) are linked to p by fixed parameters. We find that in many cases, the UPS achieves the optimal rate of convergence. Also, for many different Omega, there is a common three-phase diagram in the two-dimensional phase space quantifying the signal sparsity and signal strength. In the first phase, it is possible to recover all signals. In the second phase, it is possible to recover most of the signals, but not all of them. In the third phase, successful variable selection is impossible. UPS partitions the phase space in the same way that the optimal procedures do, and recovers most of the signals as long as successful variable selection is possible. The lasso and the subset selection are well-known approaches to variable selection. However, somewhat surprisingly, there are regions in the phase space where neither of them is rate optimal, even in very simple settings, such as Omega is tridiagonal, and when the tuning parameter is ideally set.
引用
收藏
页码:73 / 103
页数:31
相关论文
共 50 条
  • [41] Bayesian Regression Trees for High-Dimensional Prediction and Variable Selection
    Linero, Antonio R.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (522) : 626 - 636
  • [42] Stochastic variational variable selection for high-dimensional microbiome data
    Dang, Tung
    Kumaishi, Kie
    Usui, Erika
    Kobori, Shungo
    Sato, Takumi
    Toda, Yusuke
    Yamasaki, Yuji
    Tsujimoto, Hisashi
    Ichihashi, Yasunori
    Iwata, Hiroyoshi
    MICROBIOME, 2022, 10 (01)
  • [43] Variable selection in high-dimensional double generalized linear models
    Xu, Dengke
    Zhang, Zhongzhan
    Wu, Liucang
    STATISTICAL PAPERS, 2014, 55 (02) : 327 - 347
  • [44] Stochastic variational variable selection for high-dimensional microbiome data
    Tung Dang
    Kie Kumaishi
    Erika Usui
    Shungo Kobori
    Takumi Sato
    Yusuke Toda
    Yuji Yamasaki
    Hisashi Tsujimoto
    Yasunori Ichihashi
    Hiroyoshi Iwata
    Microbiome, 10
  • [45] Variable selection in high-dimensional double generalized linear models
    Dengke Xu
    Zhongzhan Zhang
    Liucang Wu
    Statistical Papers, 2014, 55 : 327 - 347
  • [46] Prediction and Variable Selection in High-Dimensional Misspecified Binary Classification
    Furmanczyk, Konrad
    Rejchel, Wojciech
    ENTROPY, 2020, 22 (05)
  • [47] High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking
    Wang, Fan
    Mukherjee, Sach
    Richardson, Sylvia
    Hill, Steven M.
    STATISTICS AND COMPUTING, 2020, 30 (03) : 697 - 719
  • [48] Combining a relaxed EM algorithm with Occam's razor for Bayesian variable selection in high-dimensional regression
    Latouche, Pierre
    Mattei, Pierre-Alexandre
    Bouveyron, Charles
    Chiquet, Julien
    JOURNAL OF MULTIVARIATE ANALYSIS, 2016, 146 : 177 - 190
  • [49] A Model Selection Criterion for High-Dimensional Linear Regression
    Owrang, Arash
    Jansson, Magnus
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2018, 66 (13) : 3436 - 3446
  • [50] High-dimensional sparse portfolio selection with nonnegative constraint
    Xia, Siwei
    Yang, Yuehan
    Yang, Hu
    APPLIED MATHEMATICS AND COMPUTATION, 2023, 443