Hypothesis testing for automated community detection in networks

被引：112

作者：

Bickel, Peter J. ^{[1
]}

Sarkar, Purnamrita ^{[2
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

[2] Univ Texas Austin, Austin, TX 78712 USA

来源：

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY | 2016年 / 78卷 / 01期

基金：

美国国家科学基金会;

关键词：

Asymptotic analysis; Community detection; Hypothesis testing; Networks; Stochastic block model; Tracy-Widom distribution; STOCHASTIC BLOCKMODELS; UNIVERSALITY; EIGENVALUES; MODEL; EDGE;

D O I：

10.1111/rssb.12117

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Community detection in networks is a key exploratory tool with applications in a diverse set of areas, ranging from finding communities in social and biological networks to identifying link farms in the World Wide Web. The problem of finding communities or clusters in a network has received much attention from statistics, physics and computer science. However, most clustering algorithms assume knowledge of the number of clusters k. We propose to determine k automatically in a graph generated from a stochastic block model by using a hypothesis test of independent interest. Our main contribution is twofold; first, we theoretically establish the limiting distribution of the principal eigenvalue of the suitably centred and scaled adjacency matrix and use that distribution for our test of the hypothesis that a random graph is of Erdos-Renyi (noise) type. Secondly, we use this test to design a recursive bipartitioning algorithm, which naturally uncovers nested community structure. Using simulations and quantifiable classification tasks on real world networks with ground truth, we show that our algorithm outperforms state of the art methods.

引用

页码：253 / 273

页数：21

共 34 条

[1] Adamic LA, 2005, P 3 INT WORKSH LINK, P36
[2] Airoldi E. M., 2013, Advances in Neural Information Processing Systems, P692
[3] Airoldi EM, 2008, J MACH LEARN RES, V9, P1981
[4] PSEUDO-LIKELIHOOD METHODS FOR COMMUNITY DETECTION IN LARGE SPARSE NETWORKS
Amini, Arash A.
Chen, Aiyou
Bickel, Peter J.
Levina, Elizaveta
[J]. ANNALS OF STATISTICS, 2013, 41 (04) : 2097 - 2122
[5] [Anonymous], 2000, Icml
[6] [Anonymous], 2014, ARXIV PREPRINT ARXIV
[7] [Anonymous], ADV NEURAL INFORM PR
[8] Properties of sufficiency and statistical tests
Bartlett, MS
[J]. PROCEEDINGS OF THE ROYAL SOCIETY OF LONDON SERIES A-MATHEMATICAL AND PHYSICAL SCIENCES, 1937, 160 (A901) : 0268 - 0282
[9] A nonparametric view of network models and Newman-Girvan and other modularities
Bickel, Peter J.
Chen, Aiyou
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (50) : 21068 - 21073
[10] Isotropic local laws for sample covariance and generalized Wigner matrices
Bloemendal, Alex
Erdos, Laszlo
Knowles, Antti
Yau, Horng-Tzer
Yin, Jun
[J]. ELECTRONIC JOURNAL OF PROBABILITY, 2014, 19

← 1 2 3 4 →