SING: Subgraph search In Non-homogeneous Graphs

被引:40
作者
Di Natale, Raffaele [1 ]
Ferro, Alfredo [1 ]
Giugno, Rosalba [1 ]
Mongiovi, Misael [1 ]
Pulvirenti, Alfredo [1 ]
Shasha, Dennis [2 ]
机构
[1] Univ Catania, Dipartimento Matemat & Informat, Catania, Italy
[2] NYU, Courant Inst Math Sci, New York, NY USA
基金
美国国家科学基金会;
关键词
ALGORITHM;
D O I
10.1186/1471-2105-11-96
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Finding the subgraphs of a graph database that are isomorphic to a given query graph has practical applications in several fields, from cheminformatics to image understanding. Since subgraph isomorphism is a computationally hard problem, indexing techniques have been intensively exploited to speed up the process. Such systems filter out those graphs which cannot contain the query, and apply a subgraph isomorphism algorithm to each residual candidate graph. The applicability of such systems is limited to databases of small graphs, because their filtering power degrades on large graphs. Results: In this paper, SING (Subgraph search In Non-homogeneous Graphs), a novel indexing system able to cope with large graphs, is presented. The method uses the notion of feature, which can be a small subgraph, subtree or path. Each graph in the database is annotated with the set of all its features. The key point is to make use of feature locality information. This idea is used to both improve the filtering performance and speed up the subgraph isomorphism task. Conclusions: Extensive tests on chemical compounds, biological networks and synthetic graphs show that the proposed system outperforms the most popular systems in query time over databases of medium and large graphs. Other specific tests show that the proposed system is effective for single large graphs.
引用
收藏
页数:15
相关论文
共 14 条
[1]  
[Anonymous], 2007, P ACM SIGMOD INT C M
[2]   A (sub)graph isomorphism algorithm for matching large graphs [J].
Cordella, LP ;
Foggia, P ;
Sansone, C ;
Vento, M .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (10) :1367-1372
[3]   GraphFind:: enhancing graph searching by low support data mining techniques [J].
Ferro, Alfredo ;
Giugno, Rosalba ;
Mongiovi, Misael ;
Pulvirenti, Alfredo ;
Skripin, Dmitry ;
Shasha, Dennis .
BMC BIOINFORMATICS, 2008, 9 (Suppl 4)
[4]  
Giugno R, 2002, INT C PATT RECOG, P112, DOI 10.1109/ICPR.2002.1048250
[5]  
He H., 2006, ICDE
[6]   Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs [J].
Kashtan, N ;
Itzkovitz, S ;
Milo, R ;
Alon, U .
BIOINFORMATICS, 2004, 20 (11) :1746-1758
[7]  
*NCI DTP, NCI DTP ANT SCREEN D
[8]   BioGRID: a general repository for interaction datasets [J].
Stark, Chris ;
Breitkreutz, Bobby-Joe ;
Reguly, Teresa ;
Boucher, Lorrie ;
Breitkreutz, Ashton ;
Tyers, Mike .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D535-D539
[9]  
ULLMANN JR, 1976, J ACM, V23, P31, DOI 10.1145/321921.321925
[10]  
Williams D., 2007, Data Engineering, P976