Big data analytics: integrating penalty strategies

被引:21
作者
Ahmed, S. Ejaz [1 ]
Yuzbasi, Bahadir [2 ]
机构
[1] Brock Univ, Dept Math & Stat, St Catharines, ON L2S 3A1, Canada
[2] Inonu Univ, Dept Econometr, Malatya, Turkey
基金
加拿大自然科学与工程研究理事会;
关键词
Sparse regression models; penalty and shrinkage estimation; estimation strategies; Monte Carlo simulation;
D O I
10.1080/17509653.2016.1153252
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
We present efficient estimation and prediction strategies for the classical multiple regression model when the dimensions of the parameters are larger than the number of observations. These strategies are motivated by penalty estimation and Stein-type estimation procedures. More specifically, we consider the estimation of regression parameters in sparse linear models when some of the predictors may have a very weak influence on the response of interest. In a high-dimensional situation, a number of existing variable selection techniques exists. However, they yield different subset models and may have different numbers of predictors. Generally speaking, the least absolute shrinkage and selection operator (Lasso) approach produces an over-fitted model compared with its competitors, namely the smoothly clipped absolute deviation (SCAD) method and adaptive Lasso (aLasso). Thus, prediction based only on a submodel selected by such methods will be subject to selection bias. In order to minimize the inherited bias, we suggest combining two models to improve the estimation and prediction performance. In the context of two competing models where one model includes more predictors than the other based on relatively aggressive variable selection strategies, we plan to investigate the relative performance of Stein-type shrinkage and penalty estimators. The shrinkage estimator improves the prediction performance of submodels significantly selected from existing Lasso-type variable selection methods. A Monte Carlo simulation study is carried out using the relative mean squared error (RMSE) criterion to appraise the performance of the listed estimators. The proposed strategy is applied to the analysis of several real high-dimensional data sets.
引用
收藏
页码:105 / 115
页数:11
相关论文
共 27 条
[1]  
Ahmed S.E., 2014, PENALTY SHRINKAGE PR
[2]   Shrinkage, pretest and absolute penalty estimators in partially linear models [J].
Ahmed, S. Ejaz ;
Doksum, Kjell A. ;
Hossain, S. ;
You, Jinhong .
AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2007, 49 (04) :435-454
[3]   LASSO and shrinkage estimation in Weibull censored regression models [J].
Ahmed, S. Ejaz ;
Hossain, Shakhawat ;
Doksum, Kjell A. .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2012, 142 (06) :1273-1284
[4]  
Aldahmani S., 2015, INT J STAT PROBABILI, V4, P61
[5]   GENERALIZED DOUBLE PARETO SHRINKAGE [J].
Armagan, Artin ;
Dunson, David B. ;
Lee, Jaeyong .
STATISTICA SINICA, 2013, 23 (01) :119-143
[6]  
Bhattacharya A., 2012, ARXIV12126088
[7]   SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR [J].
Bickel, Peter J. ;
Ritov, Ya'acov ;
Tsybakov, Alexandre B. .
ANNALS OF STATISTICS, 2009, 37 (04) :1705-1732
[8]   The horseshoe estimator for sparse signals [J].
Carvalho, Carlos M. ;
Polson, Nicholas G. ;
Scott, James G. .
BIOMETRIKA, 2010, 97 (02) :465-480
[9]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[10]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360