FULLY EFFICIENT ROBUST ESTIMATION, OUTLIER DETECTION AND VARIABLE SELECTION VIA PENALIZED REGRESSION

被引:20
作者
Kong, Dehan [1 ]
Bondell, Howard D. [2 ]
Wu, Yichao [2 ]
机构
[1] Univ Toronto, Dept Stat Sci, Toronto, ON M5S 3G3, Canada
[2] North Carolina State Univ, Dept Stat, Raleigh, NC 27695 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Adaptive; breakdown point; least trimmed squares; outliers; penalized regression; robust regression; variable selection; LEAST ANGLE REGRESSION; SQUARES REGRESSION; ORACLE PROPERTIES; MODEL SELECTION; HIGH BREAKDOWN; LASSO; LIKELIHOOD; SHRINKAGE;
D O I
10.5705/ss.202016.0441
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper studies the outlier detection and variable selection problem in linear regression. A mean shift parameter is added to the linear model to reflect the effect of outliers, where an outlier has a nonzero shift parameter. We then apply an adaptive regularization to these shift parameters to shrink most of them to zero. Those observations with nonzero mean shift parameter estimates are regarded as outliers. An L1 penalty is added to the regression parameters to select important predictors. We propose an efficient algorithm to solve this jointly penalized optimization problem and use the extended Bayesian information criteria tuning method to select the regularization parameters, since the number of parameters exceeds the sample size. Theoretical results are provided in terms of high breakdown point, full efficiency, as well as outlier detection consistency. We illustrate our method with simulations and data. Our method is extended to high-dimensional problems with dimension much larger than the sample size.
引用
收藏
页码:1031 / 1052
页数:22
相关论文
共 28 条
  • [1] SPARSE LEAST TRIMMED SQUARES REGRESSION FOR ANALYZING HIGH-DIMENSIONAL LARGE DATA SETS
    Alfons, Andreas
    Croux, Christophe
    Gelper, Sarah
    [J]. ANNALS OF APPLIED STATISTICS, 2013, 7 (01) : 226 - 248
  • [2] Belsley D.A., 1980, WILEY SERIES PROBABI
  • [3] Efficient Robust Regression via Two-Stage Generalized Empirical Likelihood
    Bondell, Howard D.
    Stefanski, Leonard A.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2013, 108 (502) : 644 - 655
  • [4] Extended Bayesian information criteria for model selection with large model spaces
    Chen, Jiahua
    Chen, Zehua
    [J]. BIOMETRIKA, 2008, 95 (03) : 759 - 771
  • [5] EXTENDED BIC FOR SMALL-n-LARGE-P SPARSE GLM
    Chen, Jiahua
    Chen, Zehua
    [J]. STATISTICA SINICA, 2012, 22 (02) : 555 - 574
  • [6] A BOUNDED INFLUENCE, HIGH BREAKDOWN, EFFICIENT REGRESSION ESTIMATOR
    COAKLEY, CW
    HETTMANSPERGER, TP
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (423) : 872 - 880
  • [7] Donoho D, 1983, FESTSCHRIFT EL LEHMA, P157
  • [8] Least angle regression - Rejoinder
    Efron, B
    Hastie, T
    Johnstone, I
    Tibshirani, R
    [J]. ANNALS OF STATISTICS, 2004, 32 (02) : 494 - 499
  • [9] Variable selection via nonconcave penalized likelihood and its oracle properties
    Fan, JQ
    Li, RZ
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) : 1348 - 1360
  • [10] Gannaz I., 2006, MATHST0612066