Elastic-net regularization in learning theory

被引:204
作者
De Mol, Christine [2 ,3 ]
De Vito, Ernesto [4 ,5 ]
Rosasco, Lorenzo [1 ,6 ]
机构
[1] MIT, Ctr Biol & Computat Learning, Cambridge, MA 02139 USA
[2] Univ Libre Bruxelles, Dept Math, B-1050 Brussels, Belgium
[3] Univ Libre Bruxelles, ECARES, B-1050 Brussels, Belgium
[4] Univ Genoa, Dipartimento Sci Architettura, I-16123 Genoa, Italy
[5] Ist Nazl Fis Nucl, Sez Genova, I-16146 Genoa, Italy
[6] Univ Genoa, Dipartimento Informat & Sci Infromaz, I-16146 Genoa, Italy
关键词
Learning; Regularization; Sparsity; Elastic net; ADAPTIVE ESTIMATION; MODEL SELECTION; VECTOR; ALGORITHMS; REGRESSION; LASSO;
D O I
10.1016/j.jco.2009.01.002
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Within the framework of statistical learning theory we analyze in detail the so-called elastic-net regularization scheme proposed by Zou and Hastie [H. Zou,T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B, 67(2) (2005) 301-320] for the selection of groups of correlated variables. To investigate the statistical properties of this scheme and in particular its consistency properties, we set up a suitable mathematical framework. Our setting is random-design regression where we allow the response variable to be vector-valued and we consider prediction functions which are linear combinations of elements (features) in an infinite-dimensional dictionary. Under the assumption that the regression function admits a sparse representation on the dictionary, we prove that there exists a particular "elastic-net representation" of the regression function such that, if the number of data increases, the elastic-net estimator is consistent not only for prediction but also for variable/feature selection. Our results include finite-sample bounds and an adaptive scheme to select the regularization parameter. Moreover, using convex analysis tools, we derive an iterative thresholding algorithm for computing the elastic-net solution which is different from the optimization procedure originally proposed in the above-cited work. (C) 2009 Elsevier Inc. All rights reserved.
引用
收藏
页码:201 / 230
页数:30
相关论文
共 44 条
[1]   Wavelet kernel penalized estimation for non-equispaced design regression [J].
Amato, U ;
Antoniadis, A ;
Pensky, M .
STATISTICS AND COMPUTING, 2006, 16 (01) :37-55
[2]  
[Anonymous], 1983, INFINITE DIMENSIONAL
[3]  
Argyriou A., 2007, ADV NEURAL INFORM PR, V19, P41
[4]  
BALDASSARRE L, 2008, P ICPR 2008 TAMP FL
[5]  
BARLA A, 2008, ESANN 2008
[6]   Approximation and learning by greedy algorithms [J].
Barron, Andrew R. ;
Cohen, Albert ;
Dahmen, Wolfgang ;
DeVore, Ronald A. .
ANNALS OF STATISTICS, 2008, 36 (01) :64-94
[7]   Regularization without preliminary knowledge of smoothness and error behaviour [J].
Bauer, F ;
Pereverzev, S .
EUROPEAN JOURNAL OF APPLIED MATHEMATICS, 2005, 16 :303-317
[8]   On regularization algorithms in learning theory [J].
Bauer, Frank ;
Pereverzev, Sergei ;
Rosasco, Lorenzo .
JOURNAL OF COMPLEXITY, 2007, 23 (01) :52-72
[9]   Aggregation and sparsity via l1 penalized least squares [J].
Bunea, Florentina ;
Tsybakov, Alexandre B. ;
Wegkamp, Marten H. .
LEARNING THEORY, PROCEEDINGS, 2006, 4005 :379-391
[10]  
Candes E, 2007, ANN STAT, V35, P2313, DOI 10.1214/009053606000001523