Hypothesis testing for automated community detection in networks

被引:112
作者
Bickel, Peter J. [1 ]
Sarkar, Purnamrita [2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Univ Texas Austin, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
Asymptotic analysis; Community detection; Hypothesis testing; Networks; Stochastic block model; Tracy-Widom distribution; STOCHASTIC BLOCKMODELS; UNIVERSALITY; EIGENVALUES; MODEL; EDGE;
D O I
10.1111/rssb.12117
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Community detection in networks is a key exploratory tool with applications in a diverse set of areas, ranging from finding communities in social and biological networks to identifying link farms in the World Wide Web. The problem of finding communities or clusters in a network has received much attention from statistics, physics and computer science. However, most clustering algorithms assume knowledge of the number of clusters k. We propose to determine k automatically in a graph generated from a stochastic block model by using a hypothesis test of independent interest. Our main contribution is twofold; first, we theoretically establish the limiting distribution of the principal eigenvalue of the suitably centred and scaled adjacency matrix and use that distribution for our test of the hypothesis that a random graph is of Erdos-Renyi (noise) type. Secondly, we use this test to design a recursive bipartitioning algorithm, which naturally uncovers nested community structure. Using simulations and quantifiable classification tasks on real world networks with ground truth, we show that our algorithm outperforms state of the art methods.
引用
收藏
页码:253 / 273
页数:21
相关论文
共 34 条
  • [1] Adamic LA, 2005, P 3 INT WORKSH LINK, P36
  • [2] Airoldi E. M., 2013, Advances in Neural Information Processing Systems, P692
  • [3] Airoldi EM, 2008, J MACH LEARN RES, V9, P1981
  • [4] PSEUDO-LIKELIHOOD METHODS FOR COMMUNITY DETECTION IN LARGE SPARSE NETWORKS
    Amini, Arash A.
    Chen, Aiyou
    Bickel, Peter J.
    Levina, Elizaveta
    [J]. ANNALS OF STATISTICS, 2013, 41 (04) : 2097 - 2122
  • [5] [Anonymous], 2000, Icml
  • [6] [Anonymous], 2014, ARXIV PREPRINT ARXIV
  • [7] [Anonymous], ADV NEURAL INFORM PR
  • [8] Properties of sufficiency and statistical tests
    Bartlett, MS
    [J]. PROCEEDINGS OF THE ROYAL SOCIETY OF LONDON SERIES A-MATHEMATICAL AND PHYSICAL SCIENCES, 1937, 160 (A901) : 0268 - 0282
  • [9] A nonparametric view of network models and Newman-Girvan and other modularities
    Bickel, Peter J.
    Chen, Aiyou
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (50) : 21068 - 21073
  • [10] Isotropic local laws for sample covariance and generalized Wigner matrices
    Bloemendal, Alex
    Erdos, Laszlo
    Knowles, Antti
    Yau, Horng-Tzer
    Yin, Jun
    [J]. ELECTRONIC JOURNAL OF PROBABILITY, 2014, 19