SASUM: A Sharing-Based Approach to Fast Approximate Subgraph Matching for Large Graphs

被引:0
作者
Kim, Song-Hyon [1 ]
Song, Inchul [1 ]
Lee, Kyong-Ha [1 ]
Lee, Yoon-Joon [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon 305701, South Korea
关键词
graph database; approximate subgraph matching; FRAMEWORK; EFFICIENT; ALGORITHM;
D O I
10.1587/transinf.E96.D.624
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Subgraph matching is a fundamental operation for querying graph-structured data. Due to potential errors and noises in real-world graph data, exact subgraph matching is sometimes inappropriate in practice. In this paper we consider an approximate subgraph matching model that allows missing edges. Based on this model, approximate subgraph matching finds all occurrences of a given query graph in a database graph, allowing missing edges. A straightforward approach is to first generate query subgraphs of a given query graph by deleting edges and then perform exact subgraph matching for each query subgraph. In this paper we propose a sharing-based approach to approximate subgraph matching, called SASUM. Our method is based on the fact that query subgraphs are highly overlapped. Due to this overlapping nature of query subgraphs, the matches of a query subgraph can be computed from the matches of a smaller query subgraph, which results in reducing the number of query subgraphs that require expensive exact subgraph matching. Our method uses a lattice framework to identify sharing opportunities between query subgraphs. To further reduce the number of graphs that need exact subgraph matching, SASUM generates small base graphs that are shared by query subgraphs and chooses the minimum number of base graphs whose matches are used to derive the matching results of all query subgraphs. A comprehensive set of experiments shows that our approach outperforms the state-of-the-art approach by orders of magnitude in terms of query execution time.
引用
收藏
页码:624 / 633
页数:10
相关论文
共 22 条
  • [1] [Anonymous], 2001, Introduction to Graph Theory
  • [2] [Anonymous], 1979, Computers and Intractablity: A Guide to the Theory of NP-Completeness
  • [3] [Anonymous], P CIKM
  • [4] [Anonymous], MIT ELECT ENG COMPUT
  • [5] [Anonymous], 2007, P ACM SIGMOD INT C M
  • [6] A (sub)graph isomorphism algorithm for matching large graphs
    Cordella, LP
    Foggia, P
    Sansone, C
    Vento, M
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (10) : 1367 - 1372
  • [7] Kim S, 2011, LECT NOTES COMPUT SC, V6587, P404, DOI 10.1007/978-3-642-20149-3_30
  • [8] The Dynamics of Viral Marketing
    Leskovec, Jure
    Adamic, Lada A.
    Huberman, Bernardo A.
    [J]. ACM TRANSACTIONS ON THE WEB, 2007, 1 (01)
  • [9] Human Protein Reference Database-2009 update
    Prasad, T. S. Keshava
    Goel, Renu
    Kandasamy, Kumaran
    Keerthikumar, Shivakumar
    Kumar, Sameer
    Mathivanan, Suresh
    Telikicherla, Deepthi
    Raju, Rajesh
    Shafreen, Beema
    Venugopal, Abhilash
    Balakrishnan, Lavanya
    Marimuthu, Arivusudar
    Banerjee, Sutopa
    Somanathan, Devi S.
    Sebastian, Aimy
    Rani, Sandhya
    Ray, Somak
    Kishore, C. J. Harrys
    Kanth, Sashi
    Ahmed, Mukhtar
    Kashyap, Manoj K.
    Mohmood, Riaz
    Ramachandra, Y. L.
    Krishna, V.
    Rahiman, B. Abdul
    Mohan, Sujatha
    Ranganathan, Prathibha
    Ramabadran, Subhashri
    Chaerkady, Raghothama
    Pandey, Akhilesh
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : D767 - D772
  • [10] Shasha D., 2002, Proceedings of the twenty-first ACM SIGMODSIGACT-SIGART symposium on Principles of database systems, P39, DOI DOI 10.1145/543613.543620