Efficient Algorithms for Group Hitting Probability Queries on Large Graphs

被引:0
作者
Guo, Qintian [1 ]
Lin, Dandan [2 ]
Wang, Sibo [1 ]
Wong, Raymond Chi-Wing [3 ]
Lin, Wenqing [4 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Shenzhen Inst Comp Sci, Shenzhen 518172, Guangdong, Peoples R China
[3] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[4] Tencent Inc, Shenzhen 518054, Guangdong, Peoples R China
关键词
Social networking (online); Data mining; Toy manufacturing industry; Image edge detection; Computational efficiency; Approximation algorithms; Web pages; graph algorithms; graphs and networks; COMPUTATION;
D O I
10.1109/TKDE.2023.3349164
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given a source node s and a target node t, the hitting probability tells us how likely an alpha-terminating random walk (which stops with probability $\alpha$alpha at each step) starting from s can hit t before it stops. This concept originates from the hitting time, a classic concept in random walks. In this paper, we focus on the group hitting probability (GHP) where the target is a set of nodes, measuring the node-to-group structural proximity. For this group version of the hitting probability, we present efficient algorithms for two types of GHP queries: the pairwise query which returns the GHP value of a target set T with respect to (w.r.t.) a source node s, and the top-k query which returns the top-k target sets with the largest GHP value w.r.t. a source node s. We first develop an efficient algorithm named SAMBA for the pairwise query, which is built on a group local push algorithm tailored for GHP, with rigorous analysis for correctness. Next, we show how to speed up SAMBA by combining the group local push algorithm with the Monte Carlo approach, where GHP brings new challenges as it might need to consider every hop of the random walk. We tackle this issue with a new formulation of the GHP and show how to provide approximation guarantees with a detailed theoretical analysis. With SAMBA as the backbone, we develop an iterative algorithm for top-k queries, which adaptively refines the bounds for the candidate target sets, and terminates as soon as it meets the stopping condition, thus saving unnecessary computational costs. We further present an optimization technique to accelerate the top-k query, improving its practical performance. Extensive experiments show that our solutions are orders of magnitude faster than their competitors.
引用
收藏
页码:2995 / 3008
页数:14
相关论文
共 47 条
  • [1] Andersen R, 2007, LECT NOTES COMPUT SC, V4863, P150
  • [2] Andersen R, 2006, ANN IEEE SYMP FOUND, P475
  • [3] [Anonymous], 2014, SNAP DATASETS
  • [4] Backstrom L, 2011, P 4 ACM INT C WEB SE, V11, P635
  • [5] Bahmani B, 2010, PROC VLDB ENDOW, V4, P173
  • [6] Benczur A.A., 2006, P INT WORKSH ADV INF, P9
  • [7] Bookmark-Coloring Algorithm for Personalized PageRank Computing
    Berkhin, Pavel
    [J]. INTERNET MATHEMATICS, 2006, 3 (01) : 41 - 62
  • [8] Brand M, 2005, SIAM PROC S, P12
  • [9] Chen Wen-Yen., 2008, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), P115
  • [10] Concentration Inequalities and Martingale Inequalities: A Survey
    Fan Chung
    Lu, Linyuan
    [J]. INTERNET MATHEMATICS, 2006, 3 (01) : 79 - 127