Parallel incremental dynamic community detection algorithm based on Spark

被引:0
作者
Wu B. [1 ,2 ]
Xiao Y. [1 ,2 ]
Zhang Y. [1 ,2 ]
机构
[1] Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing
[2] School of Computer Science, Beijing University of Posts and Telecommunications, Beijing
来源
Qinghua Daxue Xuebao/Journal of Tsinghua University | 2017年 / 57卷 / 10期
关键词
Increment computation; Parallel dynamic community detection; Permanence; Spark;
D O I
10.16511/j.cnki.qhdxxb.2017.25.041
中图分类号
学科分类号
摘要
Dynamic community detection is a crucial step in researching network evolution. Nonparallel algorithms cannot efficiently mine community structures with large amounts of data. This paper presents a parallel incremental dynamic community detection algorithm based on Spark (PIDCDS), which maximizes the total permanence of the vertices in the network to discover the community structure. PIDCDS uses parallel computations on GraphX to calculate a specialized permanence as the community partition metric. Only the permanence of the incremental vertices is computed in each step to derive a new community structure. This algorithm is accurate with less computations. PIDCDS has better stability and more accurately detects the community structure than the FacetNet algorithm. PIDCDS performs well and the execution time increases only slowly as the network scale increases. More cores will accelerate PIDCDS to some extent. © 2017, Tsinghua University Press. All right reserved.
引用
收藏
页码:1030 / 1037
页数:7
相关论文
共 16 条
[1]  
Li G., Network science: The meta science in the 21st century, Communications of the China Computer Federation, 12, 4, (2016)
[2]  
Fortunato S., Community detection in graphs, Physics Reports, 486, 3, pp. 75-174, (2010)
[3]  
Girvan M., Newman M.E.J., Community structure in social and biological networks, Proceedings of the National Academy of Sciences, 99, 12, pp. 7821-7826, (2002)
[4]  
Newman M.E.J., Modularity and community structure in networks, Proceedings of the National Academy of Sciences, 103, 23, pp. 8577-8582, (2006)
[5]  
Wang X., Li X., Chen G., Network Science: An Introduction, (2012)
[6]  
Chakraborty T., Srinivasan S., Ganguly N., Et al., On the permanence of vertices in network communities, Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1396-1405, (2014)
[7]  
Ning H., Xu W., Chi Y., Et al., Incremental spectral clustering by efficiently updating the eigen-system, Pattern Recognition, 43, 1, pp. 113-127, (2010)
[8]  
Zaharia M., Chowdhury M., Das T., Et al., Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, pp. 141-146, (2012)
[9]  
Hadoop
[10]  
Xin R.S., Gonzalez J.E., Franklin M.J., Et al., GraphX: A resilient distributed graph system on spark, Proceedings of the 1st Int Workshop on Graph Data Management Experiences and Systems, (2013)