Estimating graph distance and centrality on shared nothing architectures

被引:2
作者
Balkir, Atilla Soner [1 ]
Oktay, Huseyin [2 ]
Foster, Ian [1 ,3 ]
机构
[1] Univ Chicago, Chicago, IL 60637 USA
[2] Univ Massachusetts, Amherst, MA 01003 USA
[3] Argonne Natl Lab, Lemont, IL 60439 USA
基金
美国国家卫生研究院;
关键词
graph mining; shortest paths; graph centrality; MapReduce; MAPREDUCE;
D O I
10.1002/cpe.3354
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a parallel toolkit for pairwise distance computation in massive networks. Computing the exact shortest paths between a large number of vertices is a costly operation, and serial algorithms are not practical for billion-scale graphs. We first describe an efficient parallel method to solve the single source shortest path problem on commodity hardware with no shared memory. Using it as a building block, we introduce a new parallel algorithm to estimate the shortest paths between arbitrary pairs of vertices. Our method exploits data locality, produces highly accurate results, and allows batch computation of shortest paths with 7% average error in graphs that contain billions of edges. The proposed algorithm is up to two orders of magnitude faster than previously suggested algorithms and does not require large amounts of memory or expensive high-end servers. We further leverage this method to estimate the closeness and betweenness centrality metrics, which involve systems challenges dealing with indexing, joining, and comparing large datasets efficiently. In one experiment, we mined a real-world Web graph with 700 million nodes and 12 billion edges to identify the most central vertices and calculated more than 63 billion shortest paths in 6 h on a 20-node commodity cluster. Copyright (C) 2014 John Wiley & Sons, Ltd.
引用
收藏
页码:3587 / 3613
页数:27
相关论文
共 43 条
[1]  
Afrati FN, 2013, PROC INT CONF DATA, P62, DOI 10.1109/ICDE.2013.6544814
[2]   Fuzzy Joins Using MapReduce [J].
Afrati, Foto N. ;
Das Sarma, Anish ;
Menestrina, David ;
Parameswaran, Aditya ;
Ullman, Jeffrey D. .
2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, :498-509
[3]   Statistical mechanics of complex networks [J].
Albert, R ;
Barabási, AL .
REVIEWS OF MODERN PHYSICS, 2002, 74 (01) :47-97
[4]  
[Anonymous], 2011, P 20 ACM INT C INFOR
[5]  
[Anonymous], SYNTHESIS LECT DATA
[6]   Parallel algorithms for evaluating centrality indices in real-world networks [J].
Bader, David A. ;
Madduri, Kamesh .
2006 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2006, :539-547
[7]   A faster algorithm for betweenness centrality [J].
Brandes, U .
JOURNAL OF MATHEMATICAL SOCIOLOGY, 2001, 25 (02) :163-177
[8]   Graph Twiddling in a MapReduce World [J].
Cohen, Jonathan .
COMPUTING IN SCIENCE & ENGINEERING, 2009, 11 (04) :29-41
[9]  
Cormen T.H., 2009, Introduction to Algorithms, V3rd ed., DOI [10.2307/2583667, DOI 10.2307/2583667]
[10]  
Croft Bruce, 2009, Search Engines: Information Retrieval in Practice, V1st