UPS DELIVERS OPTIMAL PHASE DIAGRAM IN HIGH-DIMENSIONAL VARIABLE SELECTION

被引：46

作者：

Ji, Pengsheng ^{[1
]}

Jin, Jiashun ^{[2
]}

机构：

[1] Cornell Univ, Dept Stat Sci, Ithaca, NY 14853 USA

[2] Carnegie Mellon Univ, Dept Stat, Pittsburgh, PA 15213 USA

来源：

ANNALS OF STATISTICS | 2012年 / 40卷 / 01期

基金：

美国国家科学基金会;

关键词：

Graph; Hamming distance; lasso; Stein's normal means; penalization methods; phase diagram; screen and clean; subset selection; variable selection; REGULARIZATION;

D O I：

10.1214/11-AOS947

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Consider a linear model Y = X beta + z, z similar to N(0, I-n). Here, X = X-n,X-p, where both p and n are large, but p > n. We model the rows of X as lid. samples from N(0, 1/n Omega), where Omega is a p x p correlation matrix, which is unknown to us but is presumably sparse. The vector beta is also unknown but has relatively few nonzero coordinates, and we are interested in identifying these nonzeros. We propose the Univariate Penalization Screeing (UPS) for variable selection. This is a screen and clean method where we screen with univariate thresholding and clean with penalized MLE. It has two important properties: sure screening and separable after screening. These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately. The UPS is effective both in theory and in computation. We measure the performance of a procedure by the Hamming distance, and use an asymptotic framework where p -> infinity and other quantities (e.g., n, sparsity level and strength of signals) are linked to p by fixed parameters. We find that in many cases, the UPS achieves the optimal rate of convergence. Also, for many different Omega, there is a common three-phase diagram in the two-dimensional phase space quantifying the signal sparsity and signal strength. In the first phase, it is possible to recover all signals. In the second phase, it is possible to recover most of the signals, but not all of them. In the third phase, successful variable selection is impossible. UPS partitions the phase space in the same way that the optimal procedures do, and recovers most of the signals as long as successful variable selection is possible. The lasso and the subset selection are well-known approaches to variable selection. However, somewhat surprisingly, there are regions in the phase space where neither of them is rate optimal, even in very simple settings, such as Omega is tridiagonal, and when the tuning parameter is ideally set.

引用

页码：73 / 103

页数：31

共 50 条

[21] PALLADIO: a parallel framework for robust variable selection in high-dimensional data
Barbieri, Matteo
Fiorini, Samuele
Tomasi, Federico
Barla, Annalisa
PROCEEDINGS OF PYHPC2016: 6TH WORKSHOP ON PYTHON FOR HIGH-PERFORMANCE AND SCIENTIFIC COMPUTING, 2016, : 19 - 26
[22] Variable selection techniques after multiple imputation in high-dimensional data
Faisal Maqbool Zahid
Shahla Faisal
Christian Heumann
Statistical Methods & Applications, 2020, 29 : 553 - 580
[23] LASSO-type variable selection methods for high-dimensional data
Fu, Guanghui
Wang, Pan
ADVANCES IN COMPUTATIONAL MODELING AND SIMULATION, PTS 1 AND 2, 2014, 444-445 : 604 - 609
[24] Robust and consistent variable selection in high-dimensional generalized linear models
Avella-Medina, Marco
Ronchetti, Elvezio
BIOMETRIKA, 2018, 105 (01) : 31 - 44
[25] VARIABLE SELECTION AND PREDICTION WITH INCOMPLETE HIGH-DIMENSIONAL DATA
Liu, Ying
Wang, Yuanjia
Feng, Yang
Wall, Melanie M.
ANNALS OF APPLIED STATISTICS, 2016, 10 (01) : 418 - 450
[26] GREEDY VARIABLE SELECTION FOR HIGH-DIMENSIONAL COX MODELS
Lin, Chien-Tong
Cheng, Yu-Jen
Ing, Ching-Kang
STATISTICA SINICA, 2023, 33 : 1697 - 1719
[27] A Variable Selection Method for High-Dimensional Survival Data
Giordano, Francesco
Milito, Sara
Restaino, Marialuisa
MATHEMATICAL AND STATISTICAL METHODS FOR ACTUARIAL SCIENCES AND FINANCE, MAF 2022, 2022, : 303 - 308
[28] A stepwise regression algorithm for high-dimensional variable selection
Hwang, Jing-Shiang
Hu, Tsuey-Hwa
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2015, 85 (09) : 1793 - 1806
[29] Estimation and optimal structure selection of high-dimensional Toeplitz covariance matrix
Yang, Yihe
Zhou, Jie
Pan, Jianxin
JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 184
[30] Optimal Feature Selection in High-Dimensional Discriminant Analysis
Kolar, Mladen
Liu, Han
IEEE TRANSACTIONS ON INFORMATION THEORY, 2015, 61 (02) : 1063 - 1083

← 1 2 3 4 5 →