Clustering of dynamic networks through subgraphs : a study of the Enron scandal

被引:0
作者
Zreik, Rawya [1 ,2 ]
Latouche, Pierre [1 ]
Bouveyron, Charles [2 ]
机构
[1] Univ Paris 01, Lab SAMM, EA 4543, F-75231 Paris 05, France
[2] Univ Paris 05, Lab MAP5, CNRS, UMR 8145, Paris, France
来源
JOURNAL OF THE SFDS | 2015年 / 156卷 / 03期
关键词
Dynamic networks; subgraphs; random subgraph model; state space model; Automatic classification; variational inference; variational expectation maximization; Enron data;
D O I
暂无
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In recent years, many clustering methods have been proposed to extract information from networks. The principle is to look for groups of vertices with homogenous connection profiles. Most of the models for clustering are suitable for static networks, that is to say, not taking into account the temporal dimension, but can handle different types of edges, whether binary or discrete. This work is motivated by the will of analysing an evolving network describing email communications between employees of the Enron company where social positions play an important role. Therefore, in this paper, we consider the random subgraph model (RSM) which was proposed recently to model a network through latent clusters built within a known partition into subgraphs. Using a state space model to characterize the cluster proportions, RSM is then extended in order to deal with dynamic networks. We call the latter the dynamic random subgraph model (dRSM). A variational expectation maximisation (VEM) algorithm is proposed to perform inference. We show that the variational approximations lead to a new state space model from which the parameters along with hidden states can be estimated using the standard Kalman filter and Rauch-Tung-Striebel (RTS) smoother. Simulated data sets are considered to assess the proposed approach. The methodology is finally applied to the Enron email data set and allows to discover an early reaction of the partners and directors compared to the other employees regarding the coming scandal.
引用
收藏
页码:166 / 191
页数:26
相关论文
共 33 条
[1]  
Ahmed A, 2007, P CVPR, P1
[2]  
Airoldi EM, 2008, J MACH LEARN RES, V9, P1981
[3]  
[Anonymous], 2013, INT C MACHINE LEARNI
[4]  
[Anonymous], 1981, SOCIOL METHODOL, DOI [10.2307/270741, DOI 10.2307/270741]
[5]   A nonparametric view of network models and Newman-Girvan and other modularities [J].
Bickel, Peter J. ;
Chen, Aiyou .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (50) :21068-21073
[6]  
BLEI D., 2006, ADV NEURAL INFORM PR, P147
[7]   A CORRELATED TOPIC MODEL OF SCIENCE [J].
Blei, David M. ;
Lafferty, John D. .
ANNALS OF APPLIED STATISTICS, 2007, 1 (01) :17-35
[8]  
Csiszar I., 1984, STAT DECISIONS SUPPL
[9]   A mixture model for random graphs [J].
Daudin, J. -J. ;
Picard, F. ;
Robin, S. .
STATISTICS AND COMPUTING, 2008, 18 (02) :173-183
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38