A Probabilistic Framework for Structural Analysis and Community Detection in Directed Networks

被引:15
作者
Chang, Cheng-Shang [1 ]
Lee, Duan-Shin [1 ]
Liou, Li-Heng [1 ]
Lu, Sheng-Min [1 ]
Wu, Mu-Huan [1 ]
机构
[1] Natl Tsing Hua Univ, Inst Commun Engn, Hsinchu 30013, Taiwan
关键词
Centrality; community; modularity; pagerank; RANDOM-WALKS; CENTRALITY;
D O I
10.1109/TNET.2017.2762403
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
There is growing interest in structural analysis of directed networks. Two major points that need to be addressed are: 1) a formal and precise definition of the graph clustering and community detection problem in directed networks and 2) algorithm design and evaluation of community detection algorithms in directed networks. Motivated by these, we develop a probabilistic framework for structural analysis and community detection in directed networks based on our previous work in undirected networks. By relaxing the assumption from symmetric bivariate distributions in our previous work to bivariate distributions that have the same marginal distributions in this paper, we can still formally define various notions for structural analysis in directed networks, including centrality, relative centrality, community, and modularity. We also extend three commonly used community detection algorithms in undirected networks to directed networks: the hierarchical agglomerative algorithm, the partitional algorithm, and the fast unfolding algorithm. These are made possible by two modularity preserving and sparsity preserving transformations. In conjunction with the probabilistic framework, we show these three algorithms converge in a finite number of steps. In particular, we show that the partitional algorithm is a linear time algorithm for large sparse graphs. Moreover, the outputs of the hierarchical agglomerative algorithm and the fast unfolding algorithm are guaranteed to be communities. These three algorithms can also be extended to general bivariate distributions with some minor modifications. We also conduct various experiments by using two sampling methods in directed networks: 1) PageRank and 2) random walks with self-loops and backward jumps.
引用
收藏
页码:31 / 46
页数:16
相关论文
共 37 条
  • [1] Adamic L. A., 2005, P 3 INT WORKSH LINK, P36
  • [2] Andersen R, 2006, ANN IEEE SYMP FOUND, P475
  • [3] [Anonymous], 2010, P 19 INT C WORLD WID, DOI DOI 10.1145/1772690.1772755
  • [4] [Anonymous], 2006, INTERJOURNALCOMPLEX
  • [5] [Anonymous], 1995, Probability, stochastic processes, and queueing theory: the mathematics of computer performance modeling
  • [6] [Anonymous], 2009, NETWORKS INTRO
  • [7] Fast unfolding of communities in large networks
    Blondel, Vincent D.
    Guillaume, Jean-Loup
    Lambiotte, Renaud
    Lefebvre, Etienne
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
  • [8] The anatomy of a large-scale hypertextual Web search engine
    Brin, S
    Page, L
    [J]. COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7): : 107 - 117
  • [9] A Mathematical Theory for Clustering in Metric Spaces
    Chang, Cheng-Shang
    Liao, Wanjiun
    Chen, Yu-Sheng
    Liou, Li-Heng
    [J]. IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2016, 3 (01): : 2 - 16
  • [10] A Probabilistic Framework for Structural Analysis in Directed Networks
    Chang, Cheng-Shang
    Lee, Duan-Shin
    Liou, Li-Heng
    Lu, Sheng-Min
    Wu, Mu-Huan
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2016,