Community detection in general stochastic block models: fundamental limits and efficient algorithms for recovery

被引:208
作者
Abbe, Emmanuel [1 ]
Sandon, Colin [2 ]
机构
[1] Princeton Univ, PACM & EE Dept, Princeton, NJ 08544 USA
[2] Princeton Univ, Dept Math, Princeton, NJ 08544 USA
来源
2015 IEEE 56TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE | 2015年
关键词
Community detection; stochastic block models; phase transitions; clustering algorithms; information measures; graph-based codes; BLOCKMODELS; GRAPHS;
D O I
10.1109/FOCS.2015.47
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
New phase transition phenomena have recently been discovered for the stochastic block model, for the special case of two non-overlapping symmetric communities. This gives raise in particular to new algorithmic challenges driven by the thresholds. This paper investigates whether a general phenomenon takes place for multiple communities, without imposing symmetry. In the general stochastic block model SBM(n,p, W), n vertices are split into k communities of relative size {p(i)}(i is an element of[k]), and vertices in community i and j connect independently with probability {Wi,j}i,je [k]. This paper investigates the partial and exact recovery of communities in the general SBM (in the constant and logarithmic degree regimes), and uses the generality of the results to tackle overlapping communities. The contributions of the paper are: (i) an explicit characterization of the recovery threshold in the general SBM in terms of a new f-divergence function D+, which generalizes the Hellinger and Chernoff divergences, and which provides an operational meaning to a divergence function analog to the KL-divergence in the channel coding theorem, (ii) the development of an algorithm that recovers the communities all the way down to the optimal threshold and runs in quasi-linear time, showing that exact recovery has no information-theoretic to computational gap for multiple communities, (iii) the development of an efficient algorithm that detects communities in the constant degree regime with an explicit accuracy bound that can be made arbitrarily close to 1 when a prescribed signal-to-noise ratio (defined in term of the spectrum of diag(p)W) tends to infinity.
引用
收藏
页码:670 / 688
页数:19
相关论文
共 71 条
  • [41] STATISTICAL-ANALYSIS OF MULTIPLE SOCIOMETRIC RELATIONS
    FIENBERG, SE
    MEYER, MM
    WASSERMAN, SS
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1985, 80 (389) : 51 - 67
  • [42] Community structure in social and biological networks
    Girvan, M
    Newman, MEJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (12) : 7821 - 7826
  • [43] Gopalan P., 2013, P NATL ACAD SCI
  • [44] Heimlicher S., 2012, ARXIV12092910
  • [45] STOCHASTIC BLOCKMODELS - 1ST STEPS
    HOLLAND, PW
    LASKEY, KB
    LEINHARDT, S
    [J]. SOCIAL NETWORKS, 1983, 5 (02) : 109 - 137
  • [46] Consistent Shape Maps via Semidefinite Programming
    Huang, Qi-Xing
    Guibas, Leonidas
    [J]. COMPUTER GRAPHICS FORUM, 2013, 32 (05) : 177 - 186
  • [47] The Metropolis algorithm for graph bisection
    Jerrum, M
    Sorkin, GB
    [J]. DISCRETE APPLIED MATHEMATICS, 1998, 82 (1-3) : 155 - 175
  • [48] Cluster analysis for gene expression data: A survey
    Jiang, DX
    Tang, C
    Zhang, AD
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (11) : 1370 - 1386
  • [49] Stochastic blockmodels and community structure in networks
    Karrer, Brian
    Newman, M. E. J.
    [J]. PHYSICAL REVIEW E, 2011, 83 (01):
  • [50] Leskovec J., 2008, WWW