Arabesque: A System for Distributed Graph Mining

被引:150
|
作者
Teixeira, Carlos H. C. [1 ]
Fonseca, Alexandre J. [1 ]
Serafini, Marco [1 ]
Siganos, Georgos [1 ]
Zaki, Mohammed J. [1 ]
Aboulnaga, Ashraf [1 ]
机构
[1] Qatar Comp Res Inst HBKU, Ar Rayyan, Qatar
来源
SOSP'15: PROCEEDINGS OF THE TWENTY-FIFTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES | 2015年
关键词
D O I
10.1145/2815400.2815410
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, as for example finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some "interestingness" criteria desired by the user. These algorithms are very important for areas such as social networks, semantic web, and bioinformatics. In this paper, we present Arabesque, the first distributed data processing platform for implementing graph mining algorithms. Arabesque automates the process of exploring a very large number of subgraphs. It defines a high-level filter-process computational model that simplifies the development of scalable graph mining algorithms: Arabesque explores subgraphs and passes them to the application, which must simply compute outputs and decide whether the subgraph should be further extended. We use Arabesque's API to produce distributed solutions to three fundamental graph mining problems: frequent subgraph mining, counting motifs, and finding cliques. Our implementations require a handful of lines of code, scale to trillions of subgraphs, and represent in some cases the first available distributed solutions.
引用
收藏
页码:425 / 440
页数:16
相关论文
共 50 条
  • [1] Graph Data Mining with Arabesque
    Husseina, Eslam
    Ghanem, Abdurrahman
    dos Santos Dias, Vinicius Vitor
    Teixeira, Carlos H. C.
    AbuOda, Ghadeer
    Serafinia, Marco
    Siganosa, Georgos
    Moralesa, Gianmarco De Francisci
    Aboulnaga, Ashraf
    Zaki, Mohammed
    SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 1647 - 1650
  • [2] A distributed approach for graph mining in massive networks
    Talukder, N.
    Zaki, M. J.
    DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (05) : 1024 - 1052
  • [3] A distributed approach for graph mining in massive networks
    N. Talukder
    M. J. Zaki
    Data Mining and Knowledge Discovery, 2016, 30 : 1024 - 1052
  • [4] Tesseract: Distributed, General Graph Pattern Mining on Evolving Graphs
    Bindschaedler, Laurent
    Malicevic, Jasmina
    Lepers, Baptiste
    Goel, Ashvin
    Zwaenepoel, Willy
    PROCEEDINGS OF THE SIXTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '21), 2021, : 458 - 473
  • [5] Mining globally distributed frequent subgraphs in a single labeled graph
    Jiang, Xing
    Xiong, Hui
    Wang, Chen
    Tan, Ah-Hwee
    DATA & KNOWLEDGE ENGINEERING, 2009, 68 (10) : 1034 - 1058
  • [6] Distributed frequent subgraph mining on evolving graph using SPARK
    Senthilselvan, N.
    Subramaniyaswamy, V.
    Vijayakumar, V.
    Karimi, Hamid Reza
    Aswin, N.
    Ravi, Logesh
    INTELLIGENT DATA ANALYSIS, 2020, 24 (03) : 495 - 513
  • [7] Khuzdul: Efficient and Scalable Distributed Graph Pattern Mining Engine
    Chen, Jingji
    Qian, Xuehai
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, VOL 2, ASPLOS 2023, 2023, : 413 - 426
  • [8] Efficient Distributed Dynamic Graph System
    Zaki, Aya
    Attia, Mahmoud
    Hegazy, Doaa
    Amin, Safaa
    2015 IEEE SEVENTH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INFORMATION SYSTEMS (ICICIS), 2015, : 465 - 471
  • [9] Implementation of a distributed data mining system
    Cho, J
    Baik, S
    Bala, J
    COMPUTATIONAL SCIENCE - ICCS 2005, PT 3, 2005, 3516 : 1016 - 1019
  • [10] A distributed and mobile data mining system
    Wang, F
    Na, HL
    Guo, Y
    Jin, H
    PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT'2003, PROCEEDINGS, 2003, : 916 - 918