CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching

被引:96
作者
Bhattarai, Bibek [1 ]
Liu, Hang [2 ]
Huang, H. Howie [1 ]
机构
[1] George Washington Univ, Washington, DC 20052 USA
[2] Univ Massachusetts Lowell, Lowell, MA USA
来源
SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2019年
基金
美国国家科学基金会;
关键词
subgraph listing; subgraph matching; graph pattern matching; subgraph isomorphism; CECI; extreme cluster; NETWORK MOTIF DISCOVERY; EFFICIENT ALGORITHM; QUERY OPTIMIZATION; ISOMORPHISM; ENUMERATION;
D O I
10.1145/3299869.3300086
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Subgraph matching finds all distinct isomorphic embeddings of a query graph on a data graph. For large graphs, current solutions face the scalability challenge due to expensive joins, excessive false candidates, and workload imbalance. In this paper, we propose a novel framework for subgraph listing based on Compact Embedding Cluster Index (CECI), which divides the data graph into multiple embedding clusters for parallel processing. The CECI system has three unique techniques: utilizing the BFS-based filtering and reverse-BFSbased refinement to prune the unpromising candidates early on, replacing the edge verification with set intersection to speed up the candidate verification, and using search cardinality based cost estimation for detecting and dividing large embedding clusters in advance. The experiments performed on several real and synthetic datasets show that the CECI system outperforms state-of-the-art solutions on average by 20.4x for listing all embeddings and by 2.6x for enumerating the first 1,024 embeddings.
引用
收藏
页码:1447 / 1462
页数:16
相关论文
共 58 条
[1]  
Abdelhamid E, 2016, SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, P716, DOI 10.1109/SC.2016.60
[2]  
Afrati FN, 2013, PROC INT CONF DATA, P62, DOI 10.1109/ICDE.2013.6544814
[3]   Biomolecular network motif counting and discovery by color coding [J].
Alon, Noga ;
Dao, Phuong ;
Hajirasouliha, Iman ;
Hormozdiari, Fereydoun ;
Sahinalp, S. Cenk .
BIOINFORMATICS, 2008, 24 (13) :I241-I249
[4]  
[Anonymous], 2018, 2018 IEEE INT C COMM
[5]  
[Anonymous], ABS181204070 CORR
[6]  
[Anonymous], ABS180109240 COMP RE
[7]  
[Anonymous], P 10 USENIX S OP SYS
[8]  
[Anonymous], TRICORE PARALLEL TRI
[9]  
[Anonymous], P 17 US C FIL STOR T
[10]   Efficient Subgraph Matching by Postponing Cartesian Products [J].
Bi, Fei ;
Chang, Lijun ;
Lin, Xuemin ;
Qin, Lu ;
Zhang, Wenjie .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :1199-1214