Improving Distributed Subgraph Matching Algorithm on Timely Dataflow

被引:1
作者
Lai, Zhengmin [1 ]
Yang, Zhengyi [2 ]
Lai, Longbin [2 ]
机构
[1] East China Normal Univ, Shanghai, Peoples R China
[2] Univ New South Wales, Sydney, NSW, Australia
来源
2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2019) | 2019年
关键词
subgraph matching; cost evaluation; distributed algorithm; dataflow; QUERY OPTIMIZATION; ISOMORPHISM;
D O I
10.1109/ICDEW.2019.000-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The subgraph matching problem is defined to find all subgraphs of a data graph that are isomorphic to a given query graph. Subgraph matching plays a vital role in the fields of e-commerce, social media and biological science. CliqueJoin is a distributed subgraph matching algorithm that is designed to be efficient and scalable. However, CliqueJoin is originally developed on MapReduce, thus the performance of the algorithm can be affected by the notorious I/O issue of MapReduce while processing multi-round join tasks. Meanwhile, CliqueJoin does not propose a cost evaluation strategy for labelled graphs, which limits its application in practice where most real-world graphs are labelled. Targeting the limitations of CliqueJoin, we propose CliqueJoin++ to improve CliqueJoin in two aspects. Firstly, we implement CliqueJoin++ on the Timely dataflow system instead of MapReduce to avoid considerable I/O cost. Secondly, we extend the cost evaluation function in CliqueJoin to compute optimal join plans for labelled graphs in the distributed context. Extensive experiments have been conducted to show that the proposed method is up to 10 times faster than the MapReduce version for unlabelled matching, and it achieves good performance and scalability for labelled matching.
引用
收藏
页码:266 / 273
页数:8
相关论文
共 39 条
[1]  
Afrati F.N., 2013, P ICDE 13
[2]   Distributed Evaluation of Subgraph Queries Using Worst-case Optimal Low-Memory Dataflows [J].
Ammar, Khaled ;
McSherry, Frank ;
Salihoglu, Semih ;
Joglekar, Manas .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (06) :691-704
[3]  
[Anonymous], 2010, HOTCLOUD
[4]   Efficient Subgraph Matching by Postponing Cartesian Products [J].
Bi, Fei ;
Chang, Lijun ;
Lin, Xuemin ;
Qin, Lu ;
Zhang, Wenjie .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :1199-1214
[5]  
Bonnici V, 2010, LECT N BIOINFORMAT, V6282, P195, DOI 10.1007/978-3-642-16001-1_17
[6]   Protein-to-Protein Interactions: Technologies, Databases, and Algorithms [J].
Cannataro, Mario ;
Guzzi, Pietro H. ;
Veltri, Pierangelo .
ACM COMPUTING SURVEYS, 2010, 43 (01)
[7]  
Carbone Paris, 2015, The Bulletin of the Technical Committee on Data Engineering, V36, P4
[8]  
Cheng J., 2007, P 2007 ACM SIGMOD IN, P857
[9]  
Chung F, 2003, ANN COMB, V7, P21, DOI DOI 10.1007/S000260300002
[10]  
Cook Diane J, 2006, Mining graph data