ScaleMine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph

被引:0
作者
Abdelhamid, Ehab [1 ]
Abdelaziz, Ibrahim [1 ]
Kalnis, Panos [1 ]
Khayyat, Zuhair [1 ]
Jamour, Fuad [1 ]
机构
[1] KAUST, Comp Elect & Math Sci & Engn Div, Extreme Comp Res Ctr, Thuwal, Saudi Arabia
来源
SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS | 2016年
关键词
PATTERNS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Frequent Subgraph Mining is an essential operation for graph analytics and knowledge extraction. Due to its high computational cost, parallel solutions are necessary. Existing approaches either suffer from load imbalance, or high communication and synchronization overheads. In this paper we propose ScaleMine; a novel parallel frequent subgraph mining system for a single large graph. ScaleMine introduces a novel two- phase approach. The first phase is approximate; it quickly identifies subgraphs that are frequent with high probability, while collecting various statistics. The second phase computes the exact solution by employing the results of the approximation to achieve good load balance; prune the search space; generate efficient execution plans; and guide intra-task parallelism. Our experiments show that ScaleMine scales to 8,192 cores on a Cray XC40 (12x more than competitors); supports graphs with one billion edges (10x larger than competitors), and is at least an order of magnitude faster than existing solutions.
引用
收藏
页码:716 / 726
页数:11
相关论文
共 42 条
  • [1] [Anonymous], 1995, Mathematical Statistics and Data Analysis
  • [2] [Anonymous], COMPUTER J
  • [3] [Anonymous], 2014, RECONNAISSANCE FORME
  • [4] [Anonymous], 2004, P S OP SYST DES IMPL
  • [5] An Iterative MapReduce Based Frequent Subgraph Mining Algorithm
    Bhuiyan, Mansurul A.
    Al Hasan, Mohammad
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (03) : 608 - 620
  • [6] Bringmann B, 2008, LECT NOTES ARTIF INT, V5012, P858, DOI 10.1007/978-3-540-68125-0_84
  • [7] Buehrer G, 2006, IEEE DATA MINING, P97
  • [8] gApprox: Mining frequent approximate patterns from a massive network
    Chen, Chen
    Yan, Xifeng
    Zhu, Feida
    Han, Jiawei
    [J]. ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 445 - +
  • [9] Predicting Protein Function by Frequent Functional Association Pattern Mining in Protein Interaction Networks
    Cho, Young-Rae
    Zhang, Aidong
    [J]. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2010, 14 (01): : 30 - 36
  • [10] Chu W.T., 2012, P 2 ACM INT C MULT R, P27