UniWalk: Unidirectional Random Walk Based Scalable SimRank Computation over Large Graph

被引:15
作者
Song, Junshuai [1 ,2 ]
Luo, Xiongcai [1 ,2 ]
Gao, Jun [1 ,2 ]
Zhou, Chang [3 ]
Wei, Hu [3 ]
Yu, Jeffery Xu [4 ]
机构
[1] Peking Univ, Minist Educ, Key Lab High Confidence Software Technol, Beijing 100080, Peoples R China
[2] Peking Univ, Sch EECS, Beijing 100080, Peoples R China
[3] Alibaba Grp, Hangzhou 311121, Zhejiang, Peoples R China
[4] Chinese Univ Hong Kong, Dept Syst Engn, Sha Tin, Hong Kong, Peoples R China
关键词
SimRank; Monte Carlo; random walk; distributed graph processing;
D O I
10.1109/TKDE.2017.2779126
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
SimRank is an important measure of vertex-pair similarity according to the structure of graphs. Although progress has been achieved, existing methods still face challenges to handle large graphs. Besides huge index construction and maintenance cost, existing methods may require considerable search space and time overheads in the online SimRank query. In this paper, we design a Monte Carlo based method, UniWalk, to enable the fast top-k SimRank computation over large undirected graphs. UniWalk directly locates the top-k similar vertices for any single source vertex u via R sampling paths originating from u, which avoids selecting candidate vertex set C and the following OojCjRthorn bidirectional sampling paths. We also devise a path enumeration strategy to improve the SimRank precision by using path probabilities instead of path frequencies when sampling, a space-efficient method to reduce intermediate results, and a path-sharing strategy to lower the redundant path sampling cost for multiple source vertices. Furthermore, we extend UniWalk to existing distributed graph processing frameworks to improve its scalability. We conduct extensive experiments to illustrate that UniWalk has high scalability, and outperforms the state-of-the-art methods by orders of magnitude.
引用
收藏
页码:992 / 1006
页数:15
相关论文
共 22 条
[1]  
[Anonymous], 2010, KDD
[2]  
[Anonymous], 2010, P 2010 ACM SIGMOD IN, DOI [10.1145/1807167.1807184, DOI 10.1145/1807167.1807184]
[3]  
[Anonymous], 2014, OSDI 14
[4]  
[Anonymous], 2005, P 14 INT C WORLD WID, DOI DOI 10.1145/1060745.1060839
[5]  
[Anonymous], 2012, P 10 USENIX S OP SYS
[6]  
Bahmani B, 2010, PROC VLDB ENDOW, V4, P173
[7]  
Chakrabarti D, 2004, SIAM PROC S, P442
[8]  
Gao J, 2014, PROC INT CONF DATA, P556, DOI 10.1109/ICDE.2014.6816681
[9]  
Jeh G., 2002, P 8 ACM SIGKDD INT C, P538
[10]   Scalable Similarity Search for SimRank [J].
Kusumoto, Mitsuru ;
Maehara, Takanori ;
Kawarabayashi, Ken-ichi .
SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, :325-336