Efficient Parallel Processing of Distance Join Queries Over Distributed Graphs

被引：3

作者：

Zhang, Xiaofei ^{[1
]}

Chen, Lei ^{[1
]}

Wang, Min ^{[2
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China

[2] Google Inc, Santa Clara, CA 94054 USA

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2015年 / 27卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Distance join query; distributed graph processing;

D O I：

10.1109/TKDE.2014.2345383

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Distance join queries have recently been recognized as a particularly useful operation over graph data, since they capture graph similarity in a meaningful way. Consequently, they have been studied extensively in recent years [1], [2]. However, current methods are designed for centralized systems, and rely on the graph embedding for effective pruning and indexing. As graph sizes become very large and graph data must be deployed in the distributed environment, these techniques become impractical. In this work, we propose a solution for efficient parallel processing of distance join queries over distributed large graphs. There have been emerging efforts devoted to managing large graphs in distributed and parallel systems. Programming models like Pregel [3] and iterative computing framework like HaLoop [4] have been proposed to handle queries over distributed graphs. However, they are designed in the perspective of functionality instead of the query efficiency. In this work, we define an optimization problem: combining the iterative join and the graph exploration method to minimize the evaluation time of distance join queries. Without sacrificing a system's scalability, our technique exploits a light-weight vertex centric encoding schema built on a distance-aware partition of the entire graph. Extensive experiments over both real and synthetic large graphs show that, by employing an adaptive query plan generation and scheduling method, we can effectively reduce the redundant message passing and I/O costs. Compared to simply using iterative join or graph exploration method, our solution achieves as many as one order of magnitude of time saving for the query evaluation.

引用

页码：740 / 754

页数：15

共 39 条

[31] Potamias M., 2009, P 18 ACM C INF KNOWL, P867, DOI DOI 10.1145/1645953.1646063
[32] Shao B., 2012, Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, P589
[33] Sun L, 2011, PROC VLDB ENDOW, V4, P714
[34] Efficient Subgraph Matching on Billion Node Graphs
Sun, Zhao
Wang, Hongzhi
Wang, Haixun
Shao, Bin
Li, Jianzhong
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (09): : 788 - 799
[35] From "Think Like a Vertex" to "Think Like a Graph"
Tian, Yuanyuan
Balmin, Andrey
Corsten, Severin Andreas
Tatikonda, Shirish
McPherson, John
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 7 (03): : 193 - 204
[36] COMPREHENSIVIE ASSESSMENT OF ENTREPRENEURIAL DEVELOPING LEVEL OF CHINESE CITIES-BASED ON FACTOR ANALYSIS METHOD
Wei, Fan
[J]. PROCEEDINGS OF ACADEMY OF INNOVATION AND ENTREPRENEURSHIP 2010, 2010, : 99 - 104
[37] Wei Jin, 2011, Scientific and Statistical Database Management. Proceedings 23rd International Conference, SSDBM 2011, P293, DOI 10.1007/978-3-642-22351-8_18
[38] On Graph Query Optimization in Large Networks
Zhao, Peixiang
Han, Jiawei
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01): : 340 - 351
[39] Distance-Join: Pattern Match Query In a Large Graph Database
Zou, Lei
Chen, Lei
Oezsu, M. Tamer
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (01): : 886 - 897

← 1 2 3 4 →