The Smooth-Lasso and other l1 + l2-penalized methods

被引:71
作者
Hebiri, Mohamed [1 ]
van de Geer, Sara [2 ]
机构
[1] Univ Paris Est Marne La Vallee, Dept Math, F-77454 Champs Sur Marne 2, Marne La Vallee, France
[2] HG G 24 1, Seminar Stat, CH-8092 Zurich, Switzerland
关键词
Lasso; Elastic-Net; LARS; sparsity; variable selection; restricted eigenvalues; high-dimensional data; VARIABLE SELECTION; MODEL SELECTION; LOGISTIC-REGRESSION; ORACLE INEQUALITIES; DANTZIG SELECTOR; ELASTIC-NET; FUSED LASSO; SPARSITY; AGGREGATION; CONSISTENCY;
D O I
10.1214/11-EJS638
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider a linear regression problem in a high dimensional setting where the number of covariates p can be much larger than the sample size n. In such a situation, one often assumes sparsity of the regression vector, i.e., the regression vector contains many zero components. We propose a Lasso-type estimator (beta)over cap(Quad) (where 'Quad' stands for quadratic) which is based on two penalty terms. The first one is the l(1) norm of the regression coefficients used to exploit the sparsity of the regression as done by the Lasso estimator,whereas the second is a quadratic penalty term introduced to capture some additional information on the setting of the problem. We detail two special cases: the Elastic-Net (beta)over cap(EN) introduced in [42], which deals with sparse problems where correlations between variables may exist; and the Smooth-Lasso dagger (beta)over cap(SL), which responds to sparse problems where successive regression coefficients are known to vary slowly (in some situations, this can also be interpreted in terms of correlations between successive variables). From a theoretical point of view, we establish variable selection consistency results and show that (beta)over cap(Quad) achieves a Sparsity Inequality, i.e., a bound in terms of the number of non-zero components of the 'true' regression vector. These results are provided under a weaker assumption on the Gram matrix than the one used by the Lasso. In some situations this guarantees a significant improvement over the Lasso. Furthermore, a simulation study is conducted and shows that the S-Lasso (beta)over cap(SL) performs better than known methods as the Lasso, the Elastic-Net (beta)over cap(EN), and the Fused-Lasso (introduced in [30]) with respect to the estimation accuracy. This is especially the case when the regression vector is 'smooth', i.e., when the variations between successive coefficients of the unknown parameter of the regression are small. The study also reveals that the theoretical calibration of the tuning parameters and the one based on 10 fold cross validation imply two S-Lasso solutions with close performance.
引用
收藏
页码:1184 / 1226
页数:43
相关论文
共 43 条
[1]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[2]  
Bach FR, 2008, J MACH LEARN RES, V9, P1179
[3]  
BELLONI A, 2010, POST L1 PENALIZ UNPU
[4]   SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR [J].
Bickel, Peter J. ;
Ritov, Ya'acov ;
Tsybakov, Alexandre B. .
ANNALS OF STATISTICS, 2009, 37 (04) :1705-1732
[5]   Sparsity oracle inequalities for the Lasso [J].
Bunea, Florentina ;
Tsybakov, Alexandre ;
Wegkamp, Marten .
ELECTRONIC JOURNAL OF STATISTICS, 2007, 1 :169-194
[6]   Aggregation for gaussian regression [J].
Bunea, Florentina ;
Tsybakov, Alexandre B. ;
Wegkamp, Marten H. .
ANNALS OF STATISTICS, 2007, 35 (04) :1674-1697
[7]   Honest variable selection in linear and logistic regression models via l1 and l1 + l2 penalization [J].
Bunea, Florentina .
ELECTRONIC JOURNAL OF STATISTICS, 2008, 2 :1153-1194
[8]  
Bunea Florentina., 2008, PUSHING LIMITS CONT, V3, P122
[9]   Some theoretical results on the Grouped Variables Lasso [J].
Chesneau C. ;
Hebiri M. .
Mathematical Methods of Statistics, 2008, 17 (4) :317-326
[10]   Aggregation by exponential weighting and sharp oracle inequalities [J].
Dalalyan, Arnak S. ;
Tsybakov, Alexandre B. .
LEARNING THEORY, PROCEEDINGS, 2007, 4539 :97-+