STRATEGIES FOR ONLINE INFERENCE OF MODEL-BASED CLUSTERING IN LARGE AND GROWING NETWORKS

被引:19
作者
Zanghi, Hugo [1 ]
Picard, Franck [2 ]
Miele, Vincent [2 ]
Ambroise, Christophe [3 ]
机构
[1] Exalead, F-75008 Paris, France
[2] UCB Lyon 1, Lab Biometrie & Biol Evolut, F-69622 Villeurbanne, France
[3] CNRS, INRA, Lab Stat & Genome, UEVE 1152,UMR 8071, F-91000 Evry, France
关键词
Graph clustering; EM Algorithms; online strategies; web graph structure analysis; MIXED MEMBERSHIP; EM ALGORITHM; MIXTURE; CONVERGENCE; PREDICTION;
D O I
10.1214/10-AOAS359
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper we adapt online estimation strategies to perform model-based clustering on large networks. Our work focuses on two algorithms, the first based on the SAEM algorithm, and the second on variational methods. These two strategies are compared with existing approaches on simulated and real data. We use the method to decipher the connexion structure of the political websphere during the US political campaign in 2008. We show that our online EM-based algorithms offer a good trade-off between precision and speed, when estimating parameters for mixture distributions in the context of random graphs.
引用
收藏
页码:687 / 714
页数:28
相关论文
共 34 条
  • [1] Adamic Lada A., 2005, P 3 INT WORKSHOP LIN, P36, DOI DOI 10.1145/1134271.1134277
  • [2] AIROLDI E, 2005, 3 INT WORKSH LINK DI, P82
  • [3] Airoldi EM, 2007, LECT NOTES COMPUT SC, V4503, P57
  • [4] Airoldi EM, 2008, J MACH LEARN RES, V9, P1981
  • [5] A nonparametric view of network models and Newman-Girvan and other modularities
    Bickel, Peter J.
    Chen, Aiyou
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (50) : 21068 - 21073
  • [6] Graph structure in the Web
    Broder, A
    Kumar, R
    Maghoul, F
    Raghavan, P
    Rajagopalan, S
    Stata, R
    Tomkins, A
    Wiener, J
    [J]. COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 2000, 33 (1-6): : 309 - 320
  • [7] A CLASSIFICATION EM ALGORITHM FOR CLUSTERING AND 2 STOCHASTIC VERSIONS
    CELEUX, G
    GOVAERT, G
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1992, 14 (03) : 315 - 332
  • [8] A mixture model for random graphs
    Daudin, J. -J.
    Picard, F.
    Robin, S.
    [J]. STATISTICS AND COMPUTING, 2008, 18 (02) : 173 - 183
  • [9] Davison BrianD., 2000, SIGIR '00: Proceedings Of The 23rd Annual International ACM SIGIR Conference On Research and DevelopmentIn Information Retrieval, P272
  • [10] Delyon B, 1999, ANN STAT, V27, P94