Hub Discovery in Partial Correlation Graphs

被引:51
作者
Hero, Alfred [1 ,2 ,3 ]
Rajaratnam, Bala [4 ]
机构
[1] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Dept Biomed Engn, Ann Arbor, MI 48109 USA
[3] Univ Michigan, Dept Stat, Ann Arbor, MI 48109 USA
[4] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
Asymptotic Poisson limits; correlation networks; discovery rate phase transitions; Gaussian graphical models (GGMs); nearest neighbor dependence; node degree and connectivity; p-value trajectories; GENE-EXPRESSION SIGNATURE; COVARIANCE ESTIMATION;
D O I
10.1109/TIT.2012.2200825
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the most important problems in large-scale inference problems is the identification of variables that are highly dependent on several other variables. When dependence is measured by partial correlations, these variables identify those rows of the partial correlation matrix that have several entries with large magnitudes, i.e., hubs in the associated partial correlation graph. This paper develops theory and algorithms for discovering such hubs from a few observations of these variables. We introduce a hub screening framework in which the user specifies both a minimum (partial) correlation rho and a minimum degree delta to screen the vertices. The choice of rho and delta can be guided by our mathematical expressions for the phase transition correlation threshold rho(c) governing the average number of discoveries. They can also be guided by our asymptotic expressions for familywise discovery rates under the assumption of large number of variables, fixed number of multivariate samples, and weak dependence. Under the null hypothesis that the dispersion (covariance) matrix is sparse, these limiting expressions can be used to enforce familywise error constraints and to rank the discoveries in order of increasing statistical significance. For n << p, the computational complexity of the proposed partial correlation screening method is low and is therefore highly scalable. Thus, it can be applied to significantly larger problems than previous approaches. The theory is applied to discovering hubs in a high-dimensional gene microarray dataset.
引用
收藏
页码:6064 / 6078
页数:15
相关论文
共 22 条
[1]   Detection of Gauss-Markov Random Fields With Nearest-Neighbor Dependency [J].
Anandkumar, Animashree ;
Tong, Lang ;
Swami, Ananthram .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2009, 55 (02) :816-827
[2]  
Arratia R., 1990, Statistical Science, V5, P403, DOI [10.1214/ss/1177012015, DOI 10.1214/SS/1177012015]
[3]   COVARIANCE REGULARIZATION BY THRESHOLDING [J].
Bickel, Peter J. ;
Levina, Elizaveta .
ANNALS OF STATISTICS, 2008, 36 (06) :2577-2604
[4]   Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival [J].
Chang, HY ;
Nuyten, DSA ;
Sneddon, JB ;
Hastie, T ;
Tibshirani, R ;
Sorlie, T ;
Dai, HY ;
He, YDD ;
van't Veer, LJ ;
Bartelink, H ;
van de Rijn, M ;
Brown, PO ;
van de Vijver, MJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (10) :3738-3743
[5]  
Cox DR, 1996, MULTIVARIATE DEPENDE
[6]   COVARIANCE SELECTION [J].
DEMPSTER, AP .
BIOMETRICS, 1972, 28 (01) :157-&
[7]   Sparse inverse covariance estimation with the graphical lasso [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Robert .
BIOSTATISTICS, 2008, 9 (03) :432-441
[8]  
Gill R, 2010, BMC BIOINFORMATICS, V11, DOI [10.1186/1471-2105-11-95, 10.1186/1471-2105-11-427]
[9]  
GOLDSTEIN M, 1974, J R STAT SOC B, V36, P284
[10]  
Golub GH, 1989, MATRIX COMPUTATIONS