Overlapping Community Detection Using Neighborhood-Inflated Seed Expansion

被引:193
作者
Whang, Joyce Jiyoung [1 ]
Gleich, David F. [2 ]
Dhillon, Inderjit S. [3 ]
机构
[1] Sungkyunkwan Univ, Dept Comp Engn, Suwon, South Korea
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[3] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
Community detection; clustering; overlapping communities; seed expansion; seeds; personalized PageRank; NETWORKS; CUTS;
D O I
10.1109/TKDE.2016.2518687
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Community detection is an important task in network analysis. A community ( also referred to as a cluster) is a set of cohesive vertices that have more connections inside the set than outside. In many social and information networks, these communities naturally overlap. For instance, in a social network, each vertex in a graph corresponds to an individual who usually participates in multiple communities. In this paper, we propose an efficient overlapping community detection algorithm using a seed expansion approach. The key idea of our algorithm is to find good seeds, and then greedily expand these seeds based on a community metric. Within this seed expansion method, we investigate the problem of how to determine good seed nodes in a graph. In particular, we develop new seeding strategies for a personalized PageRank clustering scheme that optimizes the conductance community score. An important step in our method is the neighborhood inflation step where seeds are modified to represent their entire vertex neighborhood. Experimental results show that our seed expansion algorithm outperforms other state-of-the-art overlapping community detection methods in terms of producing cohesive clusters and identifying ground-truth communities. We also show that our new seeding strategies are better than existing strategies, and are thus effective in finding good overlapping communities in real-world networks.
引用
收藏
页码:1272 / 1284
页数:13
相关论文
共 32 条
  • [1] Link communities reveal multiscale complexity in networks
    Ahn, Yong-Yeol
    Bagrow, James P.
    Lehmann, Sune
    [J]. NATURE, 2010, 466 (7307) : 761 - U11
  • [2] Andersen R, 2006, ANN IEEE SYMP FOUND, P475
  • [3] [Anonymous], 2008, P 1 WORKSH ONL SOC N, DOI 10.1145/1397735.1397742
  • [4] [Anonymous], 2012, P 18 ACM SIGKDD INT
  • [5] [Anonymous], 2012, Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, DOI [10.1145/2339530.2339628, DOI 10.1145/2339530.2339628]
  • [6] [Anonymous], 2012, Networks, Crowds, and Markets
  • [7] [Anonymous], 1992, Structural Holes
  • [8] [Anonymous], 2013, WSDM, DOI DOI 10.1145/2433396.2433471
  • [9] Emergence of scaling in random networks
    Barabási, AL
    Albert, R
    [J]. SCIENCE, 1999, 286 (5439) : 509 - 512
  • [10] Fast Matrix Computations for Pairwise and Columnwise Commute Times and Katz Scores
    Bonchi, Francesco
    Esfandiar, Pooya
    Gleich, David F.
    Greif, Chen
    Lakshmanan, Laks V. S.
    [J]. INTERNET MATHEMATICS, 2012, 8 (1-2) : 73 - 112