EXPONENTIAL SCREENING AND OPTIMAL RATES OF SPARSE ESTIMATION

被引:108
作者
Rigollet, Philippe [1 ]
Tsybakov, Alexandre [2 ]
机构
[1] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[2] CREST, Stat Lab, F-92240 Malakoff, France
基金
美国国家科学基金会;
关键词
High-dimensional regression; aggregation; adaptation; sparsity; sparsity oracle inequalities; minimax rates; Lasso; BIC; RESTRICTED ISOMETRY PROPERTY; VARIABLE SELECTION; DANTZIG SELECTOR; AGGREGATION; RISK; BOUNDS; LASSO;
D O I
10.1214/10-AOS854
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In high-dimensional linear regression, the goal pursued here is to estimate an unknown regression function using linear combinations of a suitable set of covariates. One of the key assumptions for the success of any statistical procedure in this setup is to assume that the linear combination is sparse in some sense, for example, that it involves only few covariates. We consider a general, nonnecessarily linear, regression with Gaussian noise and study a related question, that is, to find a linear combination of approximating functions, which is at the same time sparse and has small mean squared error (MSE). We introduce a new estimation procedure, called Exponential Screening, that shows remarkable adaptation properties. It adapts to the linear combination that optimally balances MSE and sparsity, whether the latter is measured in terms of the number of nonzero entries in the combination (l(0) norm) or in terms of the global weight of the combination (l(1) norm). The power of this adaptation result is illustrated by showing that Exponential Screening solves optimally and simultaneously all the problems of aggregation in Gaussian regression that have been discussed in the literature. Moreover, we show that the performance of the Exponential Screening estimator cannot be improved in a minimax sense, even if the optimal sparsity is known in advance. The theoretical and numerical superiority of Exponential Screening compared to state-of-the-art sparse procedures is also discussed.
引用
收藏
页码:731 / 771
页数:41
相关论文
共 46 条
[1]   Adapting to unknown sparsity by controlling the false discovery rate [J].
Abramovich, Felix ;
Benjamini, Yoav ;
Donoho, David L. ;
Johnstone, Iain M. .
ANNALS OF STATISTICS, 2006, 34 (02) :584-653
[2]  
ALQUIER P., 2010, PAC BAYESIAN BOUNDS
[3]  
[Anonymous], 2004, Springer Texts in Statistics
[4]   A Simple Proof of the Restricted Isometry Property for Random Matrices [J].
Baraniuk, Richard ;
Davenport, Mark ;
DeVore, Ronald ;
Wakin, Michael .
CONSTRUCTIVE APPROXIMATION, 2008, 28 (03) :253-263
[5]   UNIVERSAL APPROXIMATION BOUNDS FOR SUPERPOSITIONS OF A SIGMOIDAL FUNCTION [J].
BARRON, AR .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1993, 39 (03) :930-945
[6]  
BICKEL P., 2008, ARXIV08011158
[7]  
Bickel P.J., 2006, Mathematical Statistics, Basic Ideas and Selected Topics, VI
[8]   SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR [J].
Bickel, Peter J. ;
Ritov, Ya'acov ;
Tsybakov, Alexandre B. .
ANNALS OF STATISTICS, 2009, 37 (04) :1705-1732
[9]   Sparsity oracle inequalities for the Lasso [J].
Bunea, Florentina ;
Tsybakov, Alexandre ;
Wegkamp, Marten .
ELECTRONIC JOURNAL OF STATISTICS, 2007, 1 :169-194
[10]   Aggregation for gaussian regression [J].
Bunea, Florentina ;
Tsybakov, Alexandre B. ;
Wegkamp, Marten H. .
ANNALS OF STATISTICS, 2007, 35 (04) :1674-1697