Sandslash: A Two-Level Framework for Efficient Graph Pattern Mining

被引:25
作者
Chen, Xuhao [1 ]
Dathathri, Roshan [2 ]
Gill, Gurbinder [2 ]
Hoang, Loc [3 ]
Pingali, Keshav [3 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] Katana Graph, Austin, TX USA
[3] UT Austin, Austin, TX USA
来源
PROCEEDINGS OF THE 2021 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2021 | 2021年
关键词
graph pattern mining; programming framework; search space pruning; performance optimization; SUBGRAPH ENUMERATION; DISCOVERY;
D O I
10.1145/3447818.3460359
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Graph pattern mining (GPM) is a key building block in diverse applications, including bioinformatics, chemical engineering, social network analysis, recommender systems and security. Existing GPM frameworks either provide high-level interfaces for productivity at the cost of expressiveness or provide expressive low-level interfaces at the cost of increased programming complexity. They also lack the flexibility to explore combinations of optimizations to achieve performance competitive with hand-optimized applications. We present Sandslash, an in-memory graph pattern mining framework that uses a novel programming interface to support productive, expressive, and efficient GPM on large graphs. Sandslash provides a high-level API that only needs a specification of the GPM problem from the user, and the system implements fast subgraph enumeration, provides efficient data structures, and applies high-level optimizations automatically. To achieve performance competitive with expert-optimized implementations, Sandslash also provides a low-level API that allows users to express algorithm-specific optimizations. This enables Sandslash to support both high-productivity and high-efficiency without losing expressiveness. We evaluate Sandslash using five GPM applications and a wide range of real-world graphs. Experimental results demonstrate that applications written using Sandslash's high-level or low-level API outperform those in state-of-the-art GPM systems AutoMine, Pangolin, and Peregrine on average by 13.8x, 7.9x, and 5.4x, respectively. We also show that these Sandslash applications outperform expert-optimized GPM implementations by 2.3x on average with less programming effort.
引用
收藏
页码:378 / 391
页数:14
相关论文
共 64 条
[1]  
Abdelhamid E, 2016, SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, P716, DOI 10.1109/SC.2016.60
[2]   EmptyHeaded: A Relational Engine for Graph Processing [J].
Aberger, Christopher R. ;
Lamb, Andrew ;
Tu, Susan ;
Noetzli, Andres ;
Olukotun, Kunle ;
Re, Christopher .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2017, 42 (04)
[3]  
Aggarwal CC, 2010, ADV DATABASE SYST, V40, P1, DOI 10.1007/978-1-4419-6045-0
[4]   Efficient Graphlet Counting for Large Networks [J].
Ahmed, Nesreen K. ;
Neville, Jennifer ;
Rossi, Ryan A. ;
Duffield, Nick .
2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, :1-10
[5]   Biomolecular network motif counting and discovery by color coding [J].
Alon, Noga ;
Dao, Phuong ;
Hajirasouliha, Iman ;
Hormozdiari, Fereydoun ;
Sahinalp, S. Cenk .
BIOINFORMATICS, 2008, 24 (13) :I241-I249
[6]  
[Anonymous], 2004, P 13 INT C WORLD WID
[7]  
Beamer S., 2015, arXiv
[8]   Higher-order organization of complex networks [J].
Benson, Austin R. ;
Gleich, David F. ;
Leskovec, Jure .
SCIENCE, 2016, 353 (6295) :163-166
[9]   CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching [J].
Bhattarai, Bibek ;
Liu, Hang ;
Huang, H. Howie .
SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, :1447-1462
[10]   Efficient Subgraph Matching by Postponing Cartesian Products [J].
Bi, Fei ;
Chang, Lijun ;
Lin, Xuemin ;
Qin, Lu ;
Zhang, Wenjie .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :1199-1214