From "Think Like a Vertex" to "Think Like a Graph"

被引:206
作者
Tian, Yuanyuan [1 ]
Balmin, Andrey [2 ]
Corsten, Severin Andreas [3 ]
Tatikonda, Shirish [1 ]
McPherson, John [1 ]
机构
[1] IBM Almaden Res Ctr, Yorktown Hts, NY 10598 USA
[2] GraphSQL, Redwood City, CA 94065 USA
[3] IBM Deutschland GmbH, Berlin, Germany
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2013年 / 7卷 / 03期
关键词
D O I
10.14778/2732232.2732238
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To meet the challenge of processing rapidly growing graph and network data created by modern applications, a number of distributed graph processing systems have emerged, such as Pregel and GraphLab. All these systems divide input graphs into partitions, and employ a "think like a vertex" programming model to support iterative graph computation. This vertex-centric model is easy to program and has been proved useful for many graph algorithms. However, this model hides the partitioning information from the users, thus prevents many algorithm-specific optimizations. This often results in longer execution time due to excessive network messages (e.g. in Pregel) or heavy scheduling overhead to ensure data consistency (e.g. in GraphLab). To address this limitation, we propose a new "think like a graph" programming paradigm. Under this graph-centric model, the partition structure is opened up to the users, and can be utilized so that communication within a partition can bypass the heavy message passing or scheduling machinery. We implemented this model in a new system, called Giraph++, based on Apache Giraph, an open source implementation of Pregel. We explore the applicability of the graph-centric model to three categories of graph algorithms, and demonstrate its flexibility and superior performance, especially on well-partitioned data. For example, on a web graph with 118 million vertices and 855 million edges, the graph-centric version of connected component detection algorithm runs 63X faster and uses 204X fewer network messages than its vertex- centric counterpart.
引用
收藏
页码:193 / 204
页数:12
相关论文
共 24 条
  • [1] Balmin A., 2004, P 30 INT C VER LARG, V30, P564
  • [2] Boldi Paolo, 2004, P 13 INT C ONWORLD W, P595
  • [3] Boldi Paolo, 2011, P 20 INT C WORLD WID, P587, DOI DOI 10.1145/1963405.1963488
  • [4] The anatomy of a large-scale hypertextual Web search engine
    Brin, S
    Page, L
    [J]. COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7): : 107 - 117
  • [5] Cheng R., 2012, P 7 ACM EUROPEAN C C, P85
  • [6] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [7] On asynchronous iterations
    Frommer, A
    Szyld, DB
    [J]. JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2000, 123 (1-2) : 201 - 216
  • [8] Gonzalez J.E., 2012, P 10 USENIX S OPERAT, V12, P17
  • [9] Huang JW, 2011, PROC VLDB ENDOW, V4, P1123
  • [10] A fast and high quality multilevel scheme for partitioning irregular graphs
    Karypis, G
    Kumar, V
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 20 (01) : 359 - 392