Robust linear regression for high-dimensional data: An overview

被引:51
作者
Filzmoser, Peter [1 ]
Nordhausen, Klaus [1 ]
机构
[1] Vienna Univ Technol, Inst Stat & Math Methods Econ, Wiedner Hauptstr 8-10, A-1040 Vienna, Austria
关键词
dimension reduction; high-dimensional data; Outlier; regression; sparsity; LEAST-SQUARES REGRESSION; VARIABLE SELECTION; RIDGE-REGRESSION; SPARSE; ESTIMATORS; PROJECTION; SHRINKAGE; OUTLIERS;
D O I
10.1002/wics.1524
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Digitization as the process of converting information into numbers leads to bigger and more complex data sets, bigger also with respect to the number of measured variables. This makes it harder or impossible for the practitioner to identify outliers or observations that are inconsistent with an underlying model. Classical least-squares based procedures can be affected by those outliers. In the regression context, this means that the parameter estimates are biased, with consequences on the validity of the statistical inference, on regression diagnostics, and on the prediction accuracy. Robust regression methods aim at assigning appropriate weights to observations that deviate from the model. While robust regression techniques are widely known in the low-dimensional case, researchers and practitioners might still not be very familiar with developments in this direction for high-dimensional data. Recently, different strategies have been proposed for robust regression in the high-dimensional case, typically based on dimension reduction, on shrinkage, including sparsity, and on combinations of such techniques. A very recent concept is downweighting single cells of the data matrix rather than complete observations, with the goal to make better use of the model-consistent information, and thus to achieve higher efficiency of the parameter estimates.
引用
收藏
页数:18
相关论文
共 58 条
[1]   SPARSE LEAST TRIMMED SQUARES REGRESSION FOR ANALYZING HIGH-DIMENSIONAL LARGE DATA SETS [J].
Alfons, Andreas ;
Croux, Christophe ;
Gelper, Sarah .
ANNALS OF APPLIED STATISTICS, 2013, 7 (01) :226-248
[2]   Robust iteratively reweighted SIMPLS [J].
Alin, Aylin ;
Agostinelli, Claudio .
JOURNAL OF CHEMOMETRICS, 2017, 31 (03)
[3]   PROPAGATION OF OUTLIERS IN MULTIVARIATE DATA [J].
Alqallaf, Fatemah ;
Van Aelst, Stefan ;
Yohai, Victor J. ;
Zamar, Ruben H. .
ANNALS OF STATISTICS, 2009, 37 (01) :311-331
[4]  
[Anonymous], 2003, Robust Regression and Outlier Detection
[5]  
[Anonymous], 2011, Robust Statistics: The Approach Based on Influence Functions, DOI [10.1002/9781118186435, DOI 10.1002/9781118186435]
[6]   Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression [J].
Arslan, Olcay .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2012, 56 (06) :1952-1965
[7]  
Basu A, 2011, IEEE WORKSHOP NEURAL, DOI DOI 10.1201/B10956
[8]  
Bottmer L., 2020, SPARSE REGRESSION LA
[9]   Robust Lasso Regression Using Tukey's Biweight Criterion [J].
Chang, Le ;
Roberts, Steven ;
Welsh, Alan .
TECHNOMETRICS, 2018, 60 (01) :36-47
[10]   Sparse partial least squares regression for simultaneous dimension reduction and variable selection [J].
Chun, Hyonho ;
Keles, Suenduez .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2010, 72 :3-25