Fast and Accurate Discovery of Degenerate Linear Motifs in Protein Sequences

被引:8
作者
Kelil, Abdellali
Dubreuil, Benjamin
Levy, Emmanuel D.
Michnick, Stephen W.
机构
[1] Univ Montreal, Dept Biochim, Montreal, PQ H3C 3J7, Canada
[2] Univ Montreal, Ctr Robert Cedergren, Montreal, PQ, Canada
基金
加拿大健康研究院;
关键词
WEB SERVER; PREDICTION; DATABASE; DISORDER; IDENTIFICATION; DOMAINS; REGIONS;
D O I
10.1371/journal.pone.0106081
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or "wildcard'' positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e. g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http://tinyurl.com/motifhound) together with the benchmark that can be used as a reference to assess future developments in motif discovery.
引用
收藏
页数:11
相关论文
共 61 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], CSH PROTOC 2007
[3]   Computational Design of High-Affinity Epitope Scaffolds by Backbone Grafting of a Linear Epitope [J].
Azoitei, Mihai L. ;
Ban, Yih-En Andrew ;
Julien, Jean-Philippe ;
Bryson, Steve ;
Schroeter, Alexandria ;
Kalyuzhniy, Oleksandr ;
Porter, Justin R. ;
Adachi, Yumiko ;
Baker, David ;
Pai, Emil F. ;
Schief, William R. .
JOURNAL OF MOLECULAR BIOLOGY, 2012, 415 (01) :175-192
[4]  
Bailey T L, 1994, Proc Int Conf Intell Syst Mol Biol, V2, P28
[5]   MEME SUITE: tools for motif discovery and searching [J].
Bailey, Timothy L. ;
Boden, Mikael ;
Buske, Fabian A. ;
Frith, Martin ;
Grant, Charles E. ;
Clementi, Luca ;
Ren, Jingyuan ;
Li, Wilfred W. ;
Noble, William S. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W202-W208
[6]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[7]   Bringing order to protein disorder through comparative genomics and genetic interactions [J].
Bellay, Jeremy ;
Han, Sangjo ;
Michaut, Magali ;
Kim, TaeHyung ;
Costanzo, Michael ;
Andrews, Brenda J. ;
Boone, Charles ;
Bader, Gary D. ;
Myers, Chad L. ;
Kim, Philip M. .
GENOME BIOLOGY, 2011, 12 (02)
[8]   Systematic Functional Prioritization of Protein Posttranslational Modifications [J].
Beltrao, Pedro ;
Albanese, Veronique ;
Kenner, Lillian R. ;
Swaney, Danielle L. ;
Burlingame, Alma ;
Villen, Judit ;
Lim, Wendell A. ;
Fraser, James S. ;
Frydman, Judith ;
Krogan, Nevan J. .
CELL, 2012, 150 (02) :413-425
[9]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[10]   Evaluating Caveolin Interactions: Do Proteins Interact with the Caveolin Scaffolding Domain through a Widespread Aromatic Residue-Rich Motif? [J].
Byrne, Dominic P. ;
Dart, Caroline ;
Rigden, Daniel J. .
PLOS ONE, 2012, 7 (09)