Testing a single regression coefficient in high dimensional linear models

被引:16
作者
Lan, Wei [1 ,2 ]
Zhong, Ping-Shou [3 ]
Li, Runze [4 ,5 ]
Wang, Hansheng [6 ]
Tsai, Chih-Ling [7 ]
机构
[1] Southwestern Univ Finance & Econ, Sch Stat, Chengdu, Peoples R China
[2] Southwestern Univ Finance & Econ, Ctr Stat Res, Chengdu, Peoples R China
[3] Michigan State Univ, Dept Stat & Probabil, E Lansing, MI 48823 USA
[4] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
[5] Penn State Univ, Methodol Ctr, University Pk, PA 16802 USA
[6] Peking Univ, Guanghua Sch Management, Dept Business Stat & Econometr, Beijing 100871, Peoples R China
[7] Univ Calif Davis, Grad Sch Management, Davis, CA 95616 USA
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Correlated Predictors Screening; False discovery rate; High dimensional data; Single coefficient test; FALSE DISCOVERY RATES; VARIABLE SELECTION; LASSO; CONSISTENCY; DEPENDENCE; GRAPHS;
D O I
10.1016/j.jeconom.2016.05.016
中图分类号
F [经济];
学科分类号
02 ;
摘要
In linear regression models with high dimensional data, the classical z-test (or t-test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z-test to assess the significance of each covariate. Based on the p-value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:154 / 168
页数:15
相关论文
共 36 条
[1]   Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain [J].
Belloni, A. ;
Chen, D. ;
Chernozhukov, V. ;
Hansen, C. .
ECONOMETRICA, 2012, 80 (06) :2369-2429
[2]   Inference on Treatment Effects after Selection among High-Dimensional ControlsaEuro [J].
Belloni, Alexandre ;
Chernozhukov, Victor ;
Hansen, Christian .
REVIEW OF ECONOMIC STUDIES, 2014, 81 (02) :608-650
[3]  
Bendat J.S., 1966, MEASUREMENT ANAL RAN, DOI DOI 10.1080/00207176608921391
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   Regularized estimation of large covariance matrices [J].
Bickel, Peter J. ;
Levina, Elizaveta .
ANNALS OF STATISTICS, 2008, 36 (01) :199-227
[6]   Statistical significance in high-dimensional linear models [J].
Buehlmann, Peter .
BERNOULLI, 2013, 19 (04) :1212-1242
[7]   Consistent variable selection in high dimensional regression via multiple testing [J].
Bunea, Florentina ;
Wegkamp, Marten H. ;
Auguste, Anna .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2006, 136 (12) :4349-4364
[8]   High dimensional variable selection via tilting [J].
Cho, Haeran ;
Fryzlewicz, Piotr .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2012, 74 :593-622
[9]  
Cook R.D., 1998, RESIDUALS INFLUENCE
[10]  
Draper N.R., 1998, Applied Regression Analysis, V326