SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION

被引:203
作者
Obozinski, Guillaume [1 ,2 ]
Wainwright, Martin J. [2 ]
Jordan, Michael I. [2 ]
机构
[1] Univ Calif Berkeley, INRIA Willow Project Team, Lab Informat, Ecole Normale Super, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
关键词
LASSO; block-norm; second-order cone program; sparsity; variable selection; multivariate regression; high-dimensional scaling; simultaneous Lasso; group Lasso; VARIABLE SELECTION; SPARSITY RECOVERY; MODEL SELECTION; GROUP LASSO; CONSISTENCY;
D O I
10.1214/09-AOS776
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In multivariate regression, a K-dimensional response vector is regressed upon a common set of p covariates, with a matrix B* is an element of R-pxK of regression coefficients. We study the behavior of the multivariate group Lasso, in which block regularization based on the l(1)/l(2) norm is used for support union recovery, or recovery of the set of s rows for which B* is nonzero. Under high-dimensional scaling, we show that the multivariate group Lasso exhibits a threshold for the recovery of the exact row pattern with high probability over the random design and noise that is specified by the sample complexity parameter theta(n, p, s): = n/[2 psi(B*) log(p - s)]. Here n is the sample size, and psi(B*) is a sparsity-overlap function measuring a combination of the sparsities and overlaps of the K-regression coefficient vectors that constitute the model. We prove that the multivariate group Lasso succeeds for problem sequences (n, p, s) such that theta(n, p, s) exceeds a critical level theta(u), and fails for sequences such that theta(n, p, s) lies below a critical level theta(l). For the special case of the standard Gaussian ensemble, we show that theta(l) = theta(u) so that the characterization is sharp. The sparsity-overlap function psi(B*) reveals that, if the design is uncorrelated on the active rows, l(1)/l(2) regularization for multivariate regression never harms performance relative to an ordinary Lasso approach and can yield substantial improvements in sample complexity (up to a factor of K) when the coefficient vectors are suitably orthogonal. For more general designs, it is possible for the ordinary Lasso to outperform the multivariate group Lasso. We complement our analysis with simulations that demonstrate the sharpness of our theoretical results, even for relatively small problems.
引用
收藏
页码:1 / 47
页数:47
相关论文
共 37 条
[1]  
Anderson T. W., 1984, An introduction to multivariate statistical analysis, V2nd
[2]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[3]  
[Anonymous], 2008, Advances in Neural Information Processing Systems
[4]  
[Anonymous], 1995, NONLINEAR PROGRAMMIN
[5]  
ARGYRIOU A, 2006, ADV NEURAL INFORM PR, V19, P41
[6]  
Bach F., 2004, P 21 INT C MACH LEAR
[7]  
Bach FR, 2008, J MACH LEARN RES, V9, P1179
[8]   SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR [J].
Bickel, Peter J. ;
Ritov, Ya'acov ;
Tsybakov, Alexandre B. .
ANNALS OF STATISTICS, 2009, 37 (04) :1705-1732
[9]  
Boyd S., 2004, CONVEX OPTIMIZATION, VFirst, DOI DOI 10.1017/CBO9780511804441
[10]   Atomic decomposition by basis pursuit [J].
Chen, SSB ;
Donoho, DL ;
Saunders, MA .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 20 (01) :33-61