Biomolecular network motif counting and discovery by color coding

被引:151
作者
Alon, Noga [2 ,3 ]
Dao, Phuong [1 ]
Hajirasouliha, Iman [1 ]
Hormozdiari, Fereydoun [1 ]
Sahinalp, S. Cenk [1 ]
机构
[1] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[2] Tel Aviv Univ, Sch Math Sci, Ramat Aviv, Israel
[3] Tel Aviv Univ, Sch Comp Sci, Ramat Aviv, Israel
关键词
D O I
10.1093/bioinformatics/btn163
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein-protein interaction (PPI) networks of many organisms share global topological features such as degree distribution, k-hop reachability, betweenness and closeness. Yet, some of these networks can differ significantly from the others in terms of local structures: e.g. the number of specific network motifs can vary significantly among PPI networks. Counting the number of network motifs provides a major challenge to compare biomolecular networks. Recently developed algorithms have been able to count the number of induced occurrences of subgraphs with k <= 7 vertices. Yet no practical algorithm exists for counting non-induced occurrences, or counting subgraphs with k >= 8 vertices. Counting non-induced occurrences of network motifs is not only challenging but also quite desirable as available PPI networks include several false interactions and miss many others. In this article, we show how to apply the 'color coding' technique for counting non-induced occurrences of subgraph topologies in the form of trees and bounded treewidth subgraphs. Our algorithm can count all occurrences of motif G' with k vertices in a network G with n vertices in time polynomial with n, provided k=O(log n). We use our algorithm to obtain 'treelet' distributions for k <= 10 of available PPI networks of unicellular organisms (Saccharomyces cerevisiae Escherichia coli and Helicobacter Pyloris), which are all quite similar, and a multicellular organism (Caenorhabditis elegans) which is significantly different. Furthermore, the treelet distribution of the unicellular organisms are similar to that obtained by the 'duplication model' but are quite different from that of the 'preferential attachment model'. The treelet distribution is robust w.r.t. sparsification with bait/edge coverage of 70% but differences can be observed when bait/edge coverage drops to 50%.
引用
收藏
页码:I241 / I249
页数:9
相关论文
共 22 条
[1]   A random graph model for power law graphs [J].
Aiello, W ;
Chung, F ;
Lu, LY .
EXPERIMENTAL MATHEMATICS, 2001, 10 (01) :53-66
[2]   COLOR-CODING [J].
ALON, N ;
YUSTER, R ;
ZWICK, U .
JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY, 1995, 42 (04) :844-856
[3]  
ALON N, 2007, P ICALP, P435
[4]  
ARVIND V, 2002, P 13 INT S ALG COMP, P453
[5]   Emergence of scaling in random networks [J].
Barabási, AL ;
Albert, R .
SCIENCE, 1999, 286 (5439) :509-512
[6]   The degree distribution of the generalized duplication model [J].
Bebek, G. ;
Berenbrink, P. ;
Cooper, C. ;
Friedetzky, T. ;
Nadeau, J. ;
Sahinalp, S. C. .
THEORETICAL COMPUTER SCIENCE, 2006, 369 (1-3) :239-249
[7]   The degree sequence of a scale-free random graph process [J].
Bollobás, B ;
Riordan, O ;
Spencer, J ;
Tusnády, G .
RANDOM STRUCTURES & ALGORITHMS, 2001, 18 (03) :279-290
[8]   Duplication models for biological networks [J].
Chung, F ;
Lu, LY ;
Dewey, TG ;
Galas, DJ .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (05) :677-687
[9]  
Dost B, 2007, LECT NOTES COMPUT SC, V4453, P1
[10]   Preferential attachment in the protein network evolution [J].
Eisenberg, E ;
Levanon, EY .
PHYSICAL REVIEW LETTERS, 2003, 91 (13) :138701-138701