Large covariance estimation by thresholding principal orthogonal complements

被引:527
作者
Fan, Jianqing [1 ]
Liao, Yuan [2 ]
Mincheva, Martina [3 ]
机构
[1] Princeton Univ, Princeton, NJ 08544 USA
[2] Univ Maryland, College Pk, MD 20742 USA
[3] Princeton Univ, Princeton, NJ 08544 USA
基金
英国工程与自然科学研究理事会; 美国国家卫生研究院;
关键词
Approximate factor model; Cross-sectional correlation; Diverging eigenvalues; High dimensionality; Low rank matrix; Principal components; Sparse matrix; Thresholding; Unknown factors; DYNAMIC-FACTOR MODEL; HIGH-DIMENSION; MATRIX DECOMPOSITION; PORTFOLIO SELECTION; COMPONENTS-ANALYSIS; LARGEST EIGENVALUE; FALSE DISCOVERY; OPTIMAL RATES; LARGE NUMBER; CONSISTENCY;
D O I
10.1111/rssb.12016
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The paper deals with the estimation of a high dimensional covariance with a conditional sparsity structure and fast diverging eigenvalues. By assuming a sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the principal orthogonal complement thresholding method POET' to explore such an approximate factor structure with sparsity. The POET-estimator includes the sample covariance matrix, the factor-based covariance matrix, the thresholding estimator and the adaptive thresholding estimator as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the effect of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented.
引用
收藏
页码:603 / 680
页数:78
相关论文
共 127 条
[1]  
Abadir K. M., 2010, 1017 RIM CTR EC AN
[2]   NOISY MATRIX DECOMPOSITION VIA CONVEX RELAXATION: OPTIMAL RATES IN HIGH DIMENSIONS [J].
Agarwal, Alekh ;
Negahban, Sahand ;
Wainwright, Martin J. .
ANNALS OF STATISTICS, 2012, 40 (02) :1171-1197
[3]   GMM estimation of linear panel data models with time-varying individual effects [J].
Ahn, SC ;
Lee, YH ;
Schmidt, P .
JOURNAL OF ECONOMETRICS, 2001, 101 (02) :219-255
[4]   Improved penalization for determining the number of factors in approximate factor models [J].
Alessi, Lucia ;
Barigozzi, Matteo ;
Capasso, Marco .
STATISTICS & PROBABILITY LETTERS, 2010, 80 (23-24) :1806-1813
[5]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[6]   HIGH-DIMENSIONAL ANALYSIS OF SEMIDEFINITE RELAXATIONS FOR SPARSE PRINCIPAL COMPONENTS [J].
Amini, Arash A. ;
Wainwright, Martin J. .
ANNALS OF STATISTICS, 2009, 37 (5B) :2877-2921
[7]  
[Anonymous], ARXIV12112671
[8]  
[Anonymous], 2012, TECHNICAL REPORT
[9]   Regularization of wavelet approximations - Rejoinder [J].
Antoniadis, A ;
Fan, J .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (455) :964-967
[10]   Wavelet methods in statistics: Some recent developments and their applications [J].
Antoniadis, Anestis .
STATISTICS SURVEYS, 2007, 1 :16-55