Variance estimation in high-dimensional linear models

被引:65
作者
Dicker, Lee H. [1 ]
机构
[1] Rutgers State Univ, Dept Stat, Piscataway, NJ 08854 USA
基金
美国国家科学基金会;
关键词
Asymptotic normality; Proportion of explained variation; Random matrix theory; Residual variance; Signal-to-noise ratio; OPTIMAL RATES; COVARIANCE; CONVERGENCE;
D O I
10.1093/biomet/ast065
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The residual variance and the proportion of explained variation are important quantities in many statistical models and model fitting procedures. They play an important role in regression diagnostics and model selection procedures, as well as in determining the performance limits in many problems. In this paper we propose new method-of-moments-based estimators for the residual variance, the proportion of explained variation and other related quantities, such as the l(2) signal strength. The proposed estimators are consistent and asymptotically normal in high-dimensional linear models with Gaussian predictors and errors, where the number of predictors d is proportional to the number of observations n; in fact, consistency holds even in settings where d/n -> infinity. Existing results on residual variance estimation in high-dimensional linear models depend on sparsity in the underlying signal. Our results require no sparsity assumptions and imply that the residual variance and the proportion of explained variation can be consistently estimated even when d > n and the underlying signal itself is nonestimable. Numerical work suggests that some of our distributional assumptions may be relaxed. A real-data analysis involving gene expression data and single nucleotide polymorphism data illustrates the performance of the proposed methods.
引用
收藏
页码:269 / 284
页数:16
相关论文
共 22 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]  
[Anonymous], 2008, Advances in Neural Information Processing Systems
[3]   On asymptotics of eigenvectors of large sample covariance matrix [J].
Bai, Z. D. ;
Miao, B. Q. ;
Pan, G. M. .
ANNALS OF PROBABILITY, 2007, 35 (04) :1532-1572
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   Regularized estimation of large covariance matrices [J].
Bickel, Peter J. ;
Levina, Elizaveta .
ANNALS OF STATISTICS, 2008, 36 (01) :199-227
[6]   Optimal rates of convergence for estimating Toeplitz covariance matrices [J].
Cai, T. Tony ;
Ren, Zhao ;
Zhou, Harrison H. .
PROBABILITY THEORY AND RELATED FIELDS, 2013, 156 (1-2) :101-143
[7]   OPTIMAL RATES OF CONVERGENCE FOR COVARIANCE MATRIX ESTIMATION [J].
Cai, T. Tony ;
Zhang, Cun-Hui ;
Zhou, Harrison H. .
ANNALS OF STATISTICS, 2010, 38 (04) :2118-2144
[8]   Fluctuations of eigenvalues and second order Poincare inequalities [J].
Chatterjee, Sourav .
PROBABILITY THEORY AND RELATED FIELDS, 2009, 143 (1-2) :1-40
[9]   Optimal equivariant prediction for high-dimensional linear models with arbitrary predictor covariance [J].
Dicker, Lee H. .
ELECTRONIC JOURNAL OF STATISTICS, 2013, 7 :1806-1834
[10]   OPERATOR NORM CONSISTENT ESTIMATION OF LARGE-DIMENSIONAL SPARSE COVARIANCE MATRICES [J].
El Karoui, Noureddine .
ANNALS OF STATISTICS, 2008, 36 (06) :2717-2756