ORACLE INEQUALITIES AND OPTIMAL INFERENCE UNDER GROUP SPARSITY

被引:220
作者
Lounici, Karim [1 ]
Pontil, Massimiliano [2 ]
van de Geer, Sara [3 ]
Tsybakov, Alexandre B. [4 ]
机构
[1] Georgia Inst Technol, Sch Math, Atlanta, GA 30332 USA
[2] UCL, Dept Comp Sci, London WC1E, England
[3] ETH, Seminar Stat, CH-8092 Zurich, Switzerland
[4] CREST, F-92240 Malakoff, France
基金
英国工程与自然科学研究理事会;
关键词
Oracle inequalities; group Lasso; minimax risk; penalized least squares; moment inequality; group sparsity; statistical learning; GROUP LASSO; SELECTION; RECOVERY; HETEROGENEITY; AGGREGATION; VARIABLES;
D O I
10.1214/11-AOS896
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of estimating a sparse linear regression vector beta* under a Gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate beta*. We establish oracle inequalities for the prediction and l(2) estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger condition, we derive bounds for the estimation error for mixed (2, p)-norms with 1 <= p <= infinity. When p = infinity, this result implies that a thresholded version of the Group Lasso estimator selects the sparsity pattern of beta* with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a minimax sense, up to a logarithmic factor, for all estimators over a class of group sparse vectors. Furthermore, we establish lower bounds for the prediction and l(2) estimation errors of the usual Lasso estimator. Using this result, we demonstrate that the Group Lasso can achieve an improvement in the prediction and estimation errors as compared to the Lasso. An important application of our results is provided by the problem of estimating multiple regression equations simultaneously or multi-task learning. In this case, we obtain refinements of the results in [In Proc. of the 22nd Annual Conference on Learning Theory (COLT) (2009)1, which allow us to establish a quantitative advantage of the Group Lasso over the usual Lasso in the multi-task setting. Finally, within the same setting, we show how our results can be extended to more general noise distributions, of which we only require the fourth moment to be finite. To obtain this extension, we establish a new maximal moment inequality, which may be of independent interest.
引用
收藏
页码:2164 / 2204
页数:41
相关论文
共 42 条
[1]  
Aaker D., 1995, Marketing Research", V9a edicao
[2]  
[Anonymous], 2002, Oxford Statistical Science Series
[3]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[4]  
[Anonymous], 1995, OXFORD STUDIES PROBA
[5]  
[Anonymous], 2001, Econometric Analysis of Cross Section and Panel Data
[6]   Convex multi-task feature learning [J].
Argyriou, Andreas ;
Evgeniou, Theodoros ;
Pontil, Massimiliano .
MACHINE LEARNING, 2008, 73 (03) :243-272
[7]  
Bach FR, 2008, J MACH LEARN RES, V9, P1179
[8]   SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR [J].
Bickel, Peter J. ;
Ritov, Ya'acov ;
Tsybakov, Alexandre B. .
ANNALS OF STATISTICS, 2009, 37 (04) :1705-1732
[9]  
Borwein JM., 2006, CONVEX ANAL NONLINEA
[10]   Sparsity oracle inequalities for the Lasso [J].
Bunea, Florentina ;
Tsybakov, Alexandre ;
Wegkamp, Marten .
ELECTRONIC JOURNAL OF STATISTICS, 2007, 1 :169-194