Random Walk with Restart on Large Graphs Using Block Elimination

被引:30
作者
Jung, Jinhong [1 ]
Shin, Kijung [2 ]
Sael, Lee [3 ]
Kang, U. [1 ]
机构
[1] Seoul Natl Univ, Dept Comp Sci & Engn, Seoul 151, South Korea
[2] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
[3] SUNY Korea, Dept Comp Sci, Inchon, South Korea
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2016年 / 41卷 / 02期
基金
新加坡国家研究基金会;
关键词
Design; Experimentation; Algorithms; Proximity; ranking in graph; random walk with restart; relevance score; PAGERANK;
D O I
10.1145/2901736
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given a large graph, how can we calculate the relevance between nodes fast and accurately? Random walk with restart (RWR) provides a good measure for this purpose and has been applied to diverse data mining applications including ranking, community detection, link prediction, and anomaly detection. Since calculating RWR from scratch takes a long time, various preprocessing methods, most of which are related to inverting adjacency matrices, have been proposed to speed up the calculation. However, these methods do not scale to large graphs because they usually produce large dense matrices that do not fit into memory. In addition, the existing methods are inappropriate when graphs dynamically change because the expensive preprocessing task needs to be computed repeatedly. In this article, we propose BEAR, a fast, scalable, and accurate method for computing RWR on large graphs. BEAR has two versions: a preprocessing method BEARS for static graphs and an incremental update method BEARD for dynamic graphs. BEARS consists of the preprocessing step and the query step. In the preprocessing step, BEARS reorders the adjacency matrix of a given graph so that it contains a large and easy-to-invert submatrix, and precomputes several matrices including the Schur complement of the submatrix. In the query step, BEARS quickly computes the RWR scores for a given query node using a block elimination approach with the matrices computed in the preprocessing step. For dynamic graphs, BEARD efficiently updates the changed parts in the preprocessed matrices of BEARS based on the observation that only small parts of the preprocessed matrices change when few edges are inserted or deleted. Through extensive experiments, we show that BEARS significantly outperforms other state-of-the-art methods in terms of preprocessing and query speed, space efficiency, and accuracy. We also show that BEARD quickly updates the preprocessed matrices and immediately computes queries when the graph changes.
引用
收藏
页数:43
相关论文
共 51 条
[1]   Friends and neighbors on the Web [J].
Adamic, LA ;
Adar, E .
SOCIAL NETWORKS, 2003, 25 (03) :211-230
[2]   Error and attack tolerance of complex networks [J].
Albert, R ;
Jeong, H ;
Barabási, AL .
NATURE, 2000, 406 (6794) :378-382
[3]  
Andersen R, 2006, ANN IEEE SYMP FOUND, P475
[4]  
[Anonymous], 1999, WWW 1999
[5]  
[Anonymous], 2011, Google's PageRank and beyond: The science of search engine rankings
[6]  
[Anonymous], 2013, P 2013 SIAM INT C DA
[7]  
[Anonymous], 2012, Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, DOI [10.1145/2339530.2339628, DOI 10.1145/2339530.2339628]
[8]  
[Anonymous], 2007, NUMERICAL RECIPES
[9]  
[Anonymous], 1984, Random walks and electric networks
[10]  
[Anonymous], 2006, KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining