SING: Subgraph search In Non-homogeneous Graphs

被引:40
作者
Di Natale, Raffaele [1 ]
Ferro, Alfredo [1 ]
Giugno, Rosalba [1 ]
Mongiovi, Misael [1 ]
Pulvirenti, Alfredo [1 ]
Shasha, Dennis [2 ]
机构
[1] Univ Catania, Dipartimento Matemat & Informat, Catania, Italy
[2] NYU, Courant Inst Math Sci, New York, NY USA
来源
BMC BIOINFORMATICS | 2010年 / 11卷
基金
美国国家科学基金会;
关键词
ALGORITHM;
D O I
10.1186/1471-2105-11-96
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Finding the subgraphs of a graph database that are isomorphic to a given query graph has practical applications in several fields, from cheminformatics to image understanding. Since subgraph isomorphism is a computationally hard problem, indexing techniques have been intensively exploited to speed up the process. Such systems filter out those graphs which cannot contain the query, and apply a subgraph isomorphism algorithm to each residual candidate graph. The applicability of such systems is limited to databases of small graphs, because their filtering power degrades on large graphs. Results: In this paper, SING (Subgraph search In Non-homogeneous Graphs), a novel indexing system able to cope with large graphs, is presented. The method uses the notion of feature, which can be a small subgraph, subtree or path. Each graph in the database is annotated with the set of all its features. The key point is to make use of feature locality information. This idea is used to both improve the filtering performance and speed up the subgraph isomorphism task. Conclusions: Extensive tests on chemical compounds, biological networks and synthetic graphs show that the proposed system outperforms the most popular systems in query time over databases of medium and large graphs. Other specific tests show that the proposed system is effective for single large graphs.
引用
收藏
页数:15
相关论文
共 14 条
  • [1] [Anonymous], 2007, P ACM SIGMOD INT C M
  • [2] A (sub)graph isomorphism algorithm for matching large graphs
    Cordella, LP
    Foggia, P
    Sansone, C
    Vento, M
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (10) : 1367 - 1372
  • [3] GraphFind:: enhancing graph searching by low support data mining techniques
    Ferro, Alfredo
    Giugno, Rosalba
    Mongiovi, Misael
    Pulvirenti, Alfredo
    Skripin, Dmitry
    Shasha, Dennis
    [J]. BMC BIOINFORMATICS, 2008, 9 (Suppl 4)
  • [4] Giugno R, 2002, INT C PATT RECOG, P112, DOI 10.1109/ICPR.2002.1048250
  • [5] He H., 2006, ICDE
  • [6] Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs
    Kashtan, N
    Itzkovitz, S
    Milo, R
    Alon, U
    [J]. BIOINFORMATICS, 2004, 20 (11) : 1746 - 1758
  • [7] *NCI DTP, NCI DTP ANT SCREEN D
  • [8] BioGRID: a general repository for interaction datasets
    Stark, Chris
    Breitkreutz, Bobby-Joe
    Reguly, Teresa
    Boucher, Lorrie
    Breitkreutz, Ashton
    Tyers, Mike
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : D535 - D539
  • [9] ULLMANN JR, 1976, J ACM, V23, P31, DOI 10.1145/321921.321925
  • [10] Williams D., 2007, Data Engineering, P976