An efficient parallel algorithm of N-hop neighborhoods on graphs in distributed environment

被引:2
作者
Liu, Wenjie [1 ]
Li, Zhanhuai [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp, Xian 710072, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
N-hop neighborhoods; graph mining; parallel computing; distributed computing;
D O I
10.1007/s11704-018-7167-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
N-hop neighborhoods information is very useful in analytic tasks on large-scale graphs, like finding clique in a social network, recommending friends or advertising links according to one's interests, predicting links among websites and etc. To get the N-hop neighborhoods information on a large graph, such as a web graph, a twitter social graph, the most straightforward method is to conduct a breadth first search (BFS) on a parallel distributed graph processing framework, such as Pregel and GraphLab. However, due to the massive volume of message transfer, the BFS method results in high communication cost and has low efficiency.In this work, we propose a key/value based method, namely KVB, which perfectly fits into the prevailing parallel graph processing framework and computes N-hop neighborhoods on a large scale graph efficiently. Unlike the BFS method, our method need not transfer large amount of neighborhoods information, thus, significantly reduces the overhead on both the communication and intermediate results in the distributed framework.We formalize the N-hop neighborhoods query processing as an optimization problem based on a set of quantitative cost metrics of parallel graph processing. Moreover, we propose a solution to efficiently load only the relevant neighborhoods for computation. Specially, we prove the optimal partial neighborhoods load problem is NP-hard and carefully design a heuristic strategy. We have implemented our algorithm on a distributed graph framework- Spark GraphX and validated our solution with extensive experiments over a number of real world and synthetic large graphs on a modest indoor cluster. Experiments show that our solution generally gains an order of magnitude speedup comparing to the state-of-art BFS implementation.
引用
收藏
页码:1309 / 1325
页数:17
相关论文
共 28 条
  • [1] Akaike H., 1998, 2 INT S INF THEOR, P199, DOI 10.1007/978-1-4612-1694-015
  • [2] [Anonymous], 2011, P HAD SUMM SANT CLAR
  • [3] Solutions to the st-connectivity problem using a GPU-based distributed BFS
    Bernaschi, Massimo
    Carbone, Giancarlo
    Mastrostefano, Enrico
    Vella, Flavio
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 76 : 145 - 153
  • [4] Bhat HS, 2010, DERIVATION BAYESIAN, P99
  • [5] Calinescu G, 2003, LECT NOTES COMPUT SC, V2865, P175
  • [6] Power-Law Distributions in Empirical Data
    Clauset, Aaron
    Shalizi, Cosma Rohilla
    Newman, M. E. J.
    [J]. SIAM REVIEW, 2009, 51 (04) : 661 - 703
  • [7] Csardi G., 2006, INTERJOURNAL COMPLEX, V1695, P1, DOI DOI 10.3724/SP.J.1087.2009.02191
  • [8] Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs
    Da Yan
    Cheng, James
    Yi Lu
    Ng, Wilfred
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (14): : 1981 - 1992
  • [9] DIOP M, 2013, 2013 IFIP, P1
  • [10] Fang YX, 2016, PROC VLDB ENDOW, V9, P1233