Outlier-resistant high-dimensional regression modelling based on distribution-free outlier detection and tuning parameter selection

被引:10
作者
Park, Heewon [1 ]
机构
[1] Yamaguchi Univ, Fac Global & Sci Studies, 1677-1 Yoshida, Yamaguchi, Yamaguchi 7538511, Japan
关键词
Distribution-free outlier detection; high-dimensional data; information criterion; L-1-type regularization; robust regression modelling; INFORMATION CRITERIA; VARIABLE SELECTION; SPARSE REGRESSION; ORACLE PROPERTIES; ADAPTIVE LASSO; ROBUST;
D O I
10.1080/00949655.2017.1287186
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The L-1-type regularization is a useful tool for high-dimensional regression modelling. Although the L-1-type approaches perform well regression modelling, the methods suffer from outliers, since the L-1-type approaches are based on non-robust methods (e. g. least squares loss function). In order to resolve the drawback, we propose a robust L-1-type regularization method based on distribution-free outlier detection measure. We consider outlier detection in principal component spaces (PCSs) to overcome dimensionality problem of high-dimensional data, and propose a novel cut-off value based on a non-parametric test. By using the distribution-free outlier detection measure, we can effectively detect outliers in PCS without distribution assumption of the Mahalanobis distance. We then propose a robust L-1-type regularization method via a weighted elastic net. The tuning parameter selection is a vital matter in L-1-type regularized regression modelling, since choosing the tuning parameters can be seen as variable selection and model estimation. We derive an information criterion to select the tuning parameters of the proposed robust L-1-type regularization method. Monte Carlo simulations and NCI60 data analysis show that the proposed robust regression modelling strategies effectively perform for high-dimensional regression modelling, even in the presence of outliers.
引用
收藏
页码:1799 / 1812
页数:14
相关论文
共 20 条
[1]  
Akaike H., 1973, 2 INT S INFORM THEOR, P267
[2]   SPARSE LEAST TRIMMED SQUARES REGRESSION FOR ANALYZING HIGH-DIMENSIONAL LARGE DATA SETS [J].
Alfons, Andreas ;
Croux, Christophe ;
Gelper, Sarah .
ANNALS OF APPLIED STATISTICS, 2013, 7 (01) :226-248
[3]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[4]   Outlier identification in high dimensions [J].
Filzmoser, Peter ;
Maronna, Ricardo ;
Werner, Mark .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (03) :1694-1711
[5]   A STATISTICAL VIEW OF SOME CHEMOMETRICS REGRESSION TOOLS [J].
FRANK, IE ;
FRIEDMAN, JH .
TECHNOMETRICS, 1993, 35 (02) :109-135
[6]  
Friedman J., 2009, R Package Version
[7]   RIDGE REGRESSION - BIASED ESTIMATION FOR NONORTHOGONAL PROBLEMS [J].
HOERL, AE ;
KENNARD, RW .
TECHNOMETRICS, 1970, 12 (01) :55-&
[8]   Robust linear model selection based on least angle regression [J].
Khan, Jafar A. ;
Van Aelst, Stefan ;
Zamar, Ruben H. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2007, 102 (480) :1289-1299
[9]  
Konishi S, 2008, SPRINGER SER STAT, P1, DOI 10.1007/978-0-387-71887-3
[10]   Generalised information criteria in model selection [J].
Konishi, S ;
Kitagawa, G .
BIOMETRIKA, 1996, 83 (04) :875-890