TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs

被引:54
作者
Wei, Zhewei [1 ,5 ,6 ]
He, Xiaodong [1 ]
Xiao, Xiaokui [2 ,7 ]
Wang, Sibo [3 ]
Shang, Shuo [4 ]
Wen, Ji-Rong [1 ]
机构
[1] Renmin Univ China, Sch Infromat, Beijing, Peoples R China
[2] Natl Univ Singapore, Sch Comp, Singapore, Singapore
[3] Univ Queensland, Brisbane, Qld, Australia
[4] King Abdullah Univ Sci & Technol, CEMSE, Thuwal, Saudi Arabia
[5] Renmin Univ China, Beijing Key Lab Big Data Management & Anal Method, Beijing, Peoples R China
[6] Renmin Univ China, Key Lab Data Engn & Knowledge Engn, MOE, Beijing, Peoples R China
[7] Nanyang Technol Univ, Singapore, Singapore
来源
SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2018年
基金
中国国家自然科学基金;
关键词
Personalized PageRank; Top-k Queries;
D O I
10.1145/3183713.3196920
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Personalized PageRank (PPR) is a classic metric that measures the relevance of graph nodes with respect to a source node. Given a graph G, a source node s, and a parameter k, a top-k PPR query returns a set of k nodes with the highest PPR values with respect to s. This type of queries serves as an important building block for numerous applications in web search and social networks, such as Twitter's Who-To-Follow recommendation service. Existing techniques for top-k PPR, however, suffer from two major deficiencies. First, they either incur prohibitive space and time overheads on large graphs, or fail to provide any guarantee on the precision of top-k results (i.e., the results returned might miss a number of actual top-k answers). Second, most of them require significant pre-computation on the input graph G, which renders them unsuitable for graphs with frequent updates (e.g., Twitter's social graph). To address the deficiencies of existing solutions, we propose TopPPR, an algorithm for top-k PPR queries that ensure at least rho precision (i.e., at least rho fraction of the actual top-k results are returned) with at least 1-1/n probability, where rho is an element of (0, 1] is a user-specified parameter and n is the number of nodes in G. In addition, TopPPR offers non-trivial guarantees on query time in terms of rho, and it can easily handle dynamic graphs as it does not require any preprocessing. We experimentally evaluate TopPPR using a variety of benchmark datasets, and demonstrate that TopPPR outperforms the state-of-the-art solutions in terms of both efficiency and precision, even when we set rho = 1 (i.e., when TopPPR returns the exact top-k results). Notably, on a billion-edge Twitter graph, TopPPR only requires 15 seconds to answer a top-500 PPR query with rho = 1.
引用
收藏
页码:441 / 456
页数:16
相关论文
共 39 条
[31]   BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs [J].
Shin, Kijung ;
Jung, Jinhong ;
Sael, Lee ;
Kang, U. .
SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, :1571-1585
[32]   NEW FAST METHOD FOR GENERATING DISCRETE RANDOM NUMBERS WITH ARBITRARY FREQUENCY DISTRIBUTIONS [J].
WALKER, AJ .
ELECTRONICS LETTERS, 1974, 10 (08) :127-128
[33]   FORA: Simple and Effective Approximate Single-Source Personalized PageRank [J].
Wang, Sibo ;
Yang, Renchi ;
Xiao, Xiaokui ;
Wei, Zhewei ;
Yang, Yin .
KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, :505-514
[34]  
Wang SB, 2016, PROC VLDB ENDOW, V10, P205
[35]   Fast and Unified Local Search for Random Walk Based K-Nearest-Neighbor Query in Large Graphs [J].
Wu, Yubao ;
Jin, Ruoming ;
Zhang, Xiang .
SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, :1139-1150
[36]  
Yu WR, 2016, IEEE DATA MINING, P589, DOI [10.1109/ICDM.2016.0070, 10.1109/ICDM.2016.163]
[37]  
Yu WR, 2013, SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, P1017
[38]   Approximate Personalized PageRank on Dynamic Graphs [J].
Zhang, Hongyang ;
Lofgren, Peter ;
Goel, Ashish .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :1315-1324
[39]   Incremental and Accuracy-Aware Personalized PageRank through Scheduled Approximation [J].
Zhu, Fanwei ;
Fang, Yuan ;
Chang, Kevin Chen-Chuan ;
Ying, Jing .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (06) :481-492