Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks

被引:16
作者
Wang, Sibo [1 ]
Tao, Yufei [2 ]
机构
[1] Univ Queensland, Brisbane, Qld, Australia
[2] Chinese Univ Hong Kong, Hong Kong, Peoples R China
来源
SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2018年
关键词
Heavy Hitters; Personalized PageRank; Approximate Algorithms; Social Recommendation; COMPUTATION;
D O I
10.1145/3183713.3196919
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given a directed graph G, a source node s, and a target node t, the personalized PageRank (PPR) oft with respect to s is the probability that a random walk starting from s terminates at t. The average of the personalized PageRank score of t with respect to each source node v is an element of V is exactly the PageRank score pi(t) of node t, which denotes the overall importance of node t in the graph. A heavy hitter of node t is a node whose contribution to pi(t) is above a phi fraction, where phi is a value between 0 and 1. Finding heavy hitters has important applications in link spam detection, classification of web pages, and friend recommendations. In this paper, we propose BLOG, an efficient framework for three types of heavy hitter queries: the pairwise approximate heavy hitter (AHH), the reverse AHH, and the multi-source reverse AHH queries. For pairwise AHH queries, our algorithm combines the Monte-Carlo approach and the backward propagation approach to reduce the cost of both methods, and incorporates new techniques to deal with high in-degree nodes. For reverse AHH and multi-source reverse AHH queries, our algorithm extends the ideas behind the pairwise AHH algorithm with a new "logarithmic bucketing" technique to improve the query efficiency. Extensive experiments demonstrate that our BLOG is far more efficient than alternative solutions on the three queries.
引用
收藏
页码:1113 / 1127
页数:15
相关论文
共 32 条
  • [1] Andersen R, 2007, LECT NOTES COMPUT SC, V4863, P150
  • [2] Andersen R, 2006, ANN IEEE SYMP FOUND, P475
  • [3] [Anonymous], 2006, TECHNICAL REPORT
  • [4] [Anonymous], 2013, ICDCN
  • [5] [Anonymous], 2005, AIRWEB
  • [6] [Anonymous], 1999, PAGERANK CITATION RA
  • [7] [Anonymous], 2013, WTF: The Who to Follow Service at Twitter, DOI DOI 10.1145/2488388.2488433
  • [8] Monte Carlo methods in pagerank computation: When one iteration is sufficient
    Avrachenkov, K.
    Litvak, N.
    Nemirovsky, D.
    Osipova, N.
    [J]. SIAM JOURNAL ON NUMERICAL ANALYSIS, 2007, 45 (02) : 890 - 904
  • [9] Backstrom L, 2011, P 4 ACM INT C WEB SE, P635, DOI [DOI 10.1145/1935826.1935914, 10.1145/1935826.1935914]
  • [10] Bahmani B., 2011, ACM SIGMOD, P973