High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence

被引:504
作者
Ravikumar, Pradeep
Wainwright, Martin J.
Raskutti, Garvesh
Yu, Bin
机构
关键词
Covariance; concentration; precision; sparsity; Gaussian graphical models; l(1) regularization; SELECTION; MODEL; MATRICES; GRAPHS;
D O I
10.1214/11-EJS631
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Given i.i.d. observations of a random vector X is an element of R-p, we study the problem of estimating both its covariance matrix Sigma*, and its inverse covariance or concentration matrix Theta* = (Sigma*)(-1). When X is multivariate Gaussian, the non-zero structure of Theta* is specified by the graph of an associated Gaussian Markov random field; and a popular estimator for such sparse Theta* is the l(1)-regularized Gaussian MLE. This estimator is sensible even for for non-Gaussian X, since it corresponds to minimizing an l(1)-penalized log-determinant Bregman divergence. We analyze its performance under high-dimensional scaling, in which the number of nodes in the graph p, the number of edges s, and the maximum node degree d, are allowed to grow as a function of the sample size n. In addition to the parameters (p, s, d), our analysis identifies other key quantities that control rates: (a) the l(infinity)-operator norm of the true covariance matrix Sigma*; and (b) the l(infinity)-operator norm of the sub-matrix Gamma(SS)*, where S indexes the graph edges, and Gamma* = (Theta*)(-1) circle times (Theta*)(-1); and (c) a mutual incoherence or irrepresentability measure on the matrix Gamma* and (d) the rate of decay 1/f(n, delta) on the probabilities {vertical bar(Sigma) over cap (n)(ij) - Sigma(ij)*vertical bar > delta}, where (Sigma) over cap (n) is the sample covariance based on n samples. Our first result establishes consistency of our estimate (Theta) over cap in the elementwise maximum-norm. This in turn allows us to derive convergence rates in Frobenius and spectral norms, with improvements upon existing results for graphs with maximum node degrees d = o(root s). In our second result, we show that with probability converging to one, the estimate (Theta) over cap correctly specifies the zero pattern of the concentration matrix Theta*. We illustrate our theoretical results via simulations for various graphs and problem parameters, showing good correspondences between the theoretical predictions and behavior in simulations.
引用
收藏
页码:935 / 980
页数:46
相关论文
共 33 条
[1]   Regularized estimation of large covariance matrices [J].
Bickel, Peter J. ;
Levina, Elizaveta .
ANNALS OF STATISTICS, 2008, 36 (01) :199-227
[2]   COVARIANCE REGULARIZATION BY THRESHOLDING [J].
Bickel, Peter J. ;
Levina, Elizaveta .
ANNALS OF STATISTICS, 2008, 36 (06) :2577-2604
[3]  
Boyd S.P, 2004, Convex optimization, DOI [DOI 10.1017/CBO9780511804441, 10.1017/CBO9780511804441]
[4]  
Bregman L.M., 1967, USSR COMP MATH MATH, V7, P191, DOI [10.1016/0041-5553(67)90069-9, DOI 10.1016/0041-5553(67)90069-9]
[5]  
Brown L. D., 1986, IMS Lecture Notes Monograph Series
[6]  
Buldygin V., 2000, Transl. Math. Monogr.
[7]  
CAI TT, 2010, ANN STAT IN PRESS
[8]  
CENSOR Y, 1988, NUMERICAL MATH SCI C
[9]   First-order methods for sparse covariance selection [J].
D'Aspremont, Alexandre ;
Banerjee, Onureena ;
El Ghaoui, Laurent .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2008, 30 (01) :56-66
[10]  
ELKAROUI N, 2008, ANN STAT IN PRESS