SPARSE LEAST TRIMMED SQUARES REGRESSION FOR ANALYZING HIGH-DIMENSIONAL LARGE DATA SETS

被引:155
作者
Alfons, Andreas [1 ]
Croux, Christophe [1 ]
Gelper, Sarah [2 ]
机构
[1] Katholieke Univ Leuven, Fac Business & Econ, ORSTAT Res Ctr, B-3000 Louvain, Belgium
[2] Erasmus Univ, Rotterdam Sch Management, NL-3000 Rotterdam, Netherlands
关键词
Breakdown point; outliers; penalized regression; robust regression; trimming; VARIABLE SELECTION; MODEL SELECTION; LASSO; SHRINKAGE;
D O I
10.1214/12-AOAS575
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Sparse model estimation is a topic of high importance in modern data analysis due to the increasing availability of data sets with a large number of variables. Another common problem in applied statistics is the presence of outliers in the data. This paper combines robust regression and sparse model estimation. A robust and sparse estimator is introduced by adding an L-1 penalty on the coefficient estimates to the well-known least trimmed squares (LTS) estimator. The breakdown point of this sparse LTS estimator is derived, and a fast algorithm for its computation is proposed. In addition, the sparse LTS is applied to protein and gene expression data of the NCI-60 cancer cell panel. Both a simulation study and the real data application show that the sparse LTS has better prediction performance than its competitors in the presence of leverage points.
引用
收藏
页码:226 / 248
页数:23
相关论文
共 38 条
  • [1] Alfons A, 2010, J STAT SOFTW, V37, P1
  • [2] [Anonymous], 2011, R LANG ENV STAT COMP
  • [3] [Anonymous], SIMFRAME SIMULATION
  • [4] [Anonymous], ROBUST STAT THEORY M
  • [5] [Anonymous], 2006, Journal of the Royal Statistical Society, Series B
  • [6] Least angle regression - Rejoinder
    Efron, B
    Hastie, T
    Johnstone, I
    Tibshirani, R
    [J]. ANNALS OF STATISTICS, 2004, 32 (02) : 494 - 499
  • [7] Variable selection via nonconcave penalized likelihood and its oracle properties
    Fan, JQ
    Li, RZ
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) : 1348 - 1360
  • [8] Weak Convergence of the Regularization Path in Penalized M-Estimation
    Germain, Jean-Francois
    Roueff, Francois
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2010, 37 (03) : 477 - 495
  • [9] SPARSE MODELING OF CATEGORIAL EXPLANATORY VARIABLES
    Gertheiss, Jan
    Tutz, Gerhard
    [J]. ANNALS OF APPLIED STATISTICS, 2010, 4 (04) : 2150 - 2180
  • [10] Mesothelin: A new target for immunotherapy
    Hassan, R
    Bera, T
    Pastan, I
    [J]. CLINICAL CANCER RESEARCH, 2004, 10 (12) : 3937 - 3942