Boosting for high-dimensional linear models

被引:262
|
作者
Buhlmann, Peter [1 ]
机构
[1] Swiss Fed Inst Technol, CH-8092 Zurich, Switzerland
来源
ANNALS OF STATISTICS | 2006年 / 34卷 / 02期
关键词
binary classification; gene expression; Lasso; matching pursuit; over-complete dictionary; sparsity; variable selection; weak greedy algorithm;
D O I
10.1214/009053606000000092
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We prove that boosting with the squared error loss, L(2)Boosting, is consistent for very high-dimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as O(exp(sample size)), assuming that the true underlying regression function is sparse in terms of the l(1)-norm of the regression coefficients. In the language of signal processing, this means consistency for de-noising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the l(1)-norm. We also propose here an AIC-based method for tuning, namely for choosing the number of boosting iterations. This makes L(2)Boosting computationally attractive since it is not required to run the algorithm multiple times for cross-validation as commonly used so far. We demonstrate L(2)Boosting for simulated data, in particular where the predictor dimension is large in comparison to sample size, and for a difficult tumor-classification problem with gene expression microarray data.
引用
收藏
页码:559 / 583
页数:25
相关论文
共 50 条
  • [1] EARLY STOPPING FOR L2-BOOSTING IN HIGH-DIMENSIONAL LINEAR MODELS
    Stankewitz, Bernhard
    ANNALS OF STATISTICS, 2024, 52 (02): : 491 - 518
  • [2] On the choice and influence of the number of boosting steps for high-dimensional linear Cox-models
    Heidi Seibold
    Christoph Bernau
    Anne-Laure Boulesteix
    Riccardo De Bin
    Computational Statistics, 2018, 33 : 1195 - 1215
  • [3] On the choice and influence of the number of boosting steps for high-dimensional linear Cox-models
    Seibold, Heidi
    Bernau, Christoph
    Boulesteix, Anne-Laure
    De Bin, Riccardo
    COMPUTATIONAL STATISTICS, 2018, 33 (03) : 1195 - 1215
  • [4] Boosting for high-multivariate responses in high-dimensional linear regression
    Lutz, Roman Werner
    Buehlmann, Peter
    STATISTICA SINICA, 2006, 16 (02) : 471 - 494
  • [5] Stable prediction in high-dimensional linear models
    Lin, Bingqing
    Wang, Qihua
    Zhang, Jun
    Pang, Zhen
    STATISTICS AND COMPUTING, 2017, 27 (05) : 1401 - 1412
  • [6] Stable prediction in high-dimensional linear models
    Bingqing Lin
    Qihua Wang
    Jun Zhang
    Zhen Pang
    Statistics and Computing, 2017, 27 : 1401 - 1412
  • [7] High-dimensional generalized linear models and the lasso
    van de Geer, Sara A.
    ANNALS OF STATISTICS, 2008, 36 (02): : 614 - 645
  • [8] Simultaneous Inference for High-Dimensional Linear Models
    Zhang, Xianyang
    Cheng, Guang
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) : 757 - 768
  • [9] Statistical significance in high-dimensional linear models
    Buehlmann, Peter
    BERNOULLI, 2013, 19 (04) : 1212 - 1242
  • [10] Variance estimation in high-dimensional linear models
    Dicker, Lee H.
    BIOMETRIKA, 2014, 101 (02) : 269 - 284