Sparsity in penalized empirical risk minimization

被引:55
作者
Koltchinskii, Vladimir [1 ]
机构
[1] Georgia Inst Technol, Sch Math, Atlanta, GA 30332 USA
来源
ANNALES DE L INSTITUT HENRI POINCARE-PROBABILITES ET STATISTIQUES | 2009年 / 45卷 / 01期
关键词
Empirical risk; Penalized empirical risk; l(p)-penalty; Sparsity; Oracle inequalities; ORACLE INEQUALITIES; MODEL SELECTION; COMPLEXITIES; RECOVERY;
D O I
10.1214/07-AIHP146
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Let (X, Y) be a random couple in S x T with unknown distribution P. Let (X-1, Y-1),..., (X-n, Y-n) be i.i.d. copies of (X, Y). P-n being their emrical distribution. Let h(1),..., h(N) : S bar right arrow [- 1, 1] be a dictionary consisting of N functions. For lambda is an element of R-N. denote f(lambda) := Sigma(N)(j=1); lambda(j)h(j). Let l: T x R bar right aroow R be a given loss function, which is convex with respect to the second variable. Denote (l center dot f)(x, y) := l(y; f(x)). We study the following penalized empirical risk minimization problem (lambda) over cap (epsilon) := (lambda is an element of RN)argmin[P-n(l center dot f(lambda)) + epsilon parallel to lambda parallel to(p)(lp)], which is an empirical version of the problem (lambda) over cap (epsilon) :=argmin[P-n(l center dot f(lambda)) + epsilon parallel to lambda parallel to(p)(lp)] (here epsilon >= 0 is a regularization parameter; lambda(0) corresponds to epsilon = 0). A number of regression and classification problems fit this general framework. We are interested in the case when p >= 1, but it is close enough to 1 (so that p - 1 is of the order 1/log N, or smaller). We show that the "sparsity" of lambda(epsilon) implies the "sparsity" of (lambda) over cap (epsilon) and study the impact of "sparsity" on bounding the excess risk P(l center dot f((lambda) over cap epsilon)) - P(l center dot f(lambda 0)) of solutions of empirical risk minimization problems.
引用
收藏
页码:7 / 57
页数:51
相关论文
共 30 条
  • [1] [Anonymous], 2004, Statistical Learning Theory and Stochastic Optimization Ecole d'Ete de Probabilites de Saint-Flour XXXI-2001. Ecole d'Ete de Probabilites de Saint-Flour, 1851
  • [2] Risk bounds for model selection via penalization
    Barron, A
    Birgé, L
    Massart, P
    [J]. PROBABILITY THEORY AND RELATED FIELDS, 1999, 113 (03) : 301 - 413
  • [3] Local Rademacher complexities
    Bartlett, PL
    Bousquet, O
    Mendelson, S
    [J]. ANNALS OF STATISTICS, 2005, 33 (04) : 1497 - 1537
  • [4] Ben-Tal A, 2001, LECT MODERN CONVEX O, V2
  • [5] Sparsity oracle inequalities for the Lasso
    Bunea, Florentina
    Tsybakov, Alexandre
    Wegkamp, Marten
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2007, 1 : 169 - 194
  • [6] Aggregation for gaussian regression
    Bunea, Florentina
    Tsybakov, Alexandre B.
    Wegkamp, Marten H.
    [J]. ANNALS OF STATISTICS, 2007, 35 (04) : 1674 - 1697
  • [7] Candes E, 2005, ANN IEEE SYMP FOUND, P295
  • [8] Candes E, 2007, ANN STAT, V35, P2313, DOI 10.1214/009053606000001523
  • [9] Stable signal recovery from incomplete and inaccurate measurements
    Candes, Emmanuel J.
    Romberg, Justin K.
    Tao, Terence
    [J]. COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, 2006, 59 (08) : 1207 - 1223
  • [10] Donoho D., 2004, For most large underdetermined systems of linear equations the minimal L1-norm near solution approximates the sparsest solution