Sandslash: A Two-Level Framework for Efficient Graph Pattern Mining

被引：25

作者：

Chen, Xuhao ^{[1
]}

Dathathri, Roshan ^{[2
]}

Gill, Gurbinder ^{[2
]}

Hoang, Loc ^{[3
]}

Pingali, Keshav ^{[3
]}

机构：

[1] MIT, CSAIL, Cambridge, MA 02139 USA

[2] Katana Graph, Austin, TX USA

[3] UT Austin, Austin, TX USA

来源：

PROCEEDINGS OF THE 2021 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2021 | 2021年

关键词：

graph pattern mining; programming framework; search space pruning; performance optimization; SUBGRAPH ENUMERATION; DISCOVERY;

D O I：

10.1145/3447818.3460359

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Graph pattern mining (GPM) is a key building block in diverse applications, including bioinformatics, chemical engineering, social network analysis, recommender systems and security. Existing GPM frameworks either provide high-level interfaces for productivity at the cost of expressiveness or provide expressive low-level interfaces at the cost of increased programming complexity. They also lack the flexibility to explore combinations of optimizations to achieve performance competitive with hand-optimized applications. We present Sandslash, an in-memory graph pattern mining framework that uses a novel programming interface to support productive, expressive, and efficient GPM on large graphs. Sandslash provides a high-level API that only needs a specification of the GPM problem from the user, and the system implements fast subgraph enumeration, provides efficient data structures, and applies high-level optimizations automatically. To achieve performance competitive with expert-optimized implementations, Sandslash also provides a low-level API that allows users to express algorithm-specific optimizations. This enables Sandslash to support both high-productivity and high-efficiency without losing expressiveness. We evaluate Sandslash using five GPM applications and a wide range of real-world graphs. Experimental results demonstrate that applications written using Sandslash's high-level or low-level API outperform those in state-of-the-art GPM systems AutoMine, Pangolin, and Peregrine on average by 13.8x, 7.9x, and 5.4x, respectively. We also show that these Sandslash applications outperform expert-optimized GPM implementations by 2.3x on average with less programming effort.

引用

页码：378 / 391

页数：14

共 64 条

[1]

Abdelhamid E, 2016, SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, P716, DOI 10.1109/SC.2016.60

[2] EmptyHeaded: A Relational Engine for Graph Processing [J].

Aberger, Christopher R. ;

Lamb, Andrew ;

Tu, Susan ;

Noetzli, Andres ;

Olukotun, Kunle ;

Re, Christopher .

ACM TRANSACTIONS ON DATABASE SYSTEMS, 2017, 42 (04)

[3]

Aggarwal CC, 2010, ADV DATABASE SYST, V40, P1, DOI 10.1007/978-1-4419-6045-0

[4] Efficient Graphlet Counting for Large Networks [J].

Ahmed, Nesreen K. ;

Neville, Jennifer ;

Rossi, Ryan A. ;

Duffield, Nick .

2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, :1-10

[5] Biomolecular network motif counting and discovery by color coding [J].

Alon, Noga ;

Dao, Phuong ;

Hajirasouliha, Iman ;

Hormozdiari, Fereydoun ;

Sahinalp, S. Cenk .

BIOINFORMATICS, 2008, 24 (13) :I241-I249

[6]

[Anonymous], 2004, P 13 INT C WORLD WID

[7]

Beamer S., 2015, arXiv

[8] Higher-order organization of complex networks [J].

Benson, Austin R. ;

Gleich, David F. ;

Leskovec, Jure .

SCIENCE, 2016, 353 (6295) :163-166

[9] CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching [J].

Bhattarai, Bibek ;

Liu, Hang ;

Huang, H. Howie .

SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, :1447-1462

[10] Efficient Subgraph Matching by Postponing Cartesian Products [J].

Bi, Fei ;

Chang, Lijun ;

Lin, Xuemin ;

Qin, Lu ;

Zhang, Wenjie .

SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :1199-1214

← 1 2 3 4 5 6 7 →