Combining a popularity-productivity stochastic block model with a discriminative-content model for general structure detection

被引:16
作者
Chai, Bian-fang [1 ,2 ]
Yu, Jian [1 ]
Jia, Cai-yan [1 ]
Yang, Tian-bao [3 ]
Jiang, Ya-wen [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China
[2] Shijiazhuang Univ Econ, Dept Informat Engn, Shijiazhuang 050031, Hebei, Peoples R China
[3] GE Global Res, San Ramon, CA 94583 USA
来源
PHYSICAL REVIEW E | 2013年 / 88卷 / 01期
基金
美国国家科学基金会; 北京市自然科学基金;
关键词
COMMUNITY; PREDICTION; NETWORKS;
D O I
10.1103/PhysRevE.88.012807
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
Latent community discovery that combines links and contents of a text-associated network has drawn more attention with the advance of social media. Most of the previous studies aim at detecting densely connected communities and are not able to identify general structures, e. g., bipartite structure. Several variants based on the stochastic block model are more flexible for exploring general structures by introducing link probabilities between communities. However, these variants cannot identify the degree distributions of real networks due to a lack of modeling of the differences among nodes, and they are not suitable for discovering communities in text-associated networks because they ignore the contents of nodes. In this paper, we propose a popularity-productivity stochastic block (PPSB) model by introducing two random variables, popularity and productivity, to model the differences among nodes in receiving links and producing links, respectively. This model has the flexibility of existing stochastic block models in discovering general community structures and inherits the richness of previous models that also exploit popularity and productivity in modeling the real scale-free networks with power law degree distributions. To incorporate the contents in text-associated networks, we propose a combined model which combines the PPSB model with a discriminative model that models the community memberships of nodes by their contents. We then develop expectation-maximization (EM) algorithms to infer the parameters in the two models. Experiments on synthetic and real networks have demonstrated that the proposed models can yield better performances than previous models, especially on networks with general structures.
引用
收藏
页数:10
相关论文
共 38 条
[1]  
Adam Gyenge., 2010, Proceedings of the Eighth Workshop on Mining and Learning with Graphs, P62
[2]  
Airoldi EM, 2008, J MACH LEARN RES, V9, P1981
[3]  
[Anonymous], 2008, P 14 ACM SIGKDD INT
[4]  
Balasubramanyan R., 2011, P 2011 SIAM INT C DA, P450, DOI DOI 10.1137/1.9781611972818.39
[5]   Efficient and principled method for detecting communities in networks [J].
Ball, Brian ;
Karrer, Brian ;
Newman, M. E. J. .
PHYSICAL REVIEW E, 2011, 84 (03)
[6]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[7]  
Cohn D, 2001, ADV NEUR IN, V13, P430
[8]  
Cohn D., 2000, ICML, P167
[9]   A mixture model for random graphs [J].
Daudin, J. -J. ;
Picard, F. ;
Robin, S. .
STATISTICS AND COMPUTING, 2008, 18 (02) :173-183
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38