Covariate-assisted spectral clustering

被引:85
作者
Binkiewicz, N. [1 ]
Vogelstein, J. T. [2 ]
Rohe, K. [1 ]
机构
[1] Univ Wisconsin, Dept Stat, 1300 Univ Ave, Madison, WI 53706 USA
[2] Johns Hopkins Univ, Dept Biomed Engn, 720 Rutland Ave, Baltimore, MD 21205 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Brain graph; Laplacian; Network; Node attribute; Stochastic blockmodel; COMMUNITY DETECTION;
D O I
10.1093/biomet/asx008
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Biological and social systems consist of myriad interacting units. The interactions can be represented in the form of a graph or network. Measurements of these graphs can reveal the underlying structure of these interactions, which provides insight into the systems that generated the graphs. Moreover, in applications such as connectomics, social networks, and genomics, graph data are accompanied by contextualizing measures on each node. We utilize these node covariates to help uncover latent communities in a graph, using a modification of spectral clustering. Statistical guarantees are provided under a joint mixture model that we call the node-contextualized stochastic blockmodel, including a bound on the misclustering rate. The bound is used to derive conditions for achieving perfect clustering. For most simulated cases, covariate-assisted spectral clustering yields results superior both to regularized spectral clustering without node covariates and to an adaptation of canonical correlation analysis. We apply our clustering method to large brain graphs derived from diffusion MRI data, using the node locations or neurological region membership as covariates. In both cases, covariate-assisted spectral clustering yields clusters that are easier to interpret neurologically.
引用
收藏
页码:361 / 377
页数:17
相关论文
共 25 条
[1]  
Airoldi EM, 2008, J MACH LEARN RES, V9, P1981
[2]   PSEUDO-LIKELIHOOD METHODS FOR COMMUNITY DETECTION IN LARGE SPARSE NETWORKS [J].
Amini, Arash A. ;
Chen, Aiyou ;
Bickel, Peter J. ;
Levina, Elizaveta .
ANNALS OF STATISTICS, 2013, 41 (04) :2097-2122
[3]  
[Anonymous], 2013, P NIPS
[4]  
[Anonymous], 2003, P TEXT MINING LINK A
[5]  
[Anonymous], P SDM
[6]  
[Anonymous], 2006, Advances in Neural Information Processing Systems
[7]   Restarted block Lanczos bidiagonalization methods [J].
Baglama, James ;
Reichel, Lothar .
NUMERICAL ALGORITHMS, 2006, 43 (03) :251-272
[8]  
Balasubramanyan R., 2011, P 2011 SIAM INT C DA, P450, DOI [DOI 10.1137/1.9781611972818.39, 10.1137/1.9781611972818.39]
[9]   HIERARCHICAL RELATIONAL MODELS FOR DOCUMENT NETWORKS [J].
Chang, Jonathan ;
Blei, David M. .
ANNALS OF APPLIED STATISTICS, 2010, 4 (01) :124-150
[10]  
Chaudhuri K., 2012, P 25 ANN C LEARNING, V23