The Proteome Folding Project: Proteome-scale prediction of structure and function

被引:31
作者
Drew, Kevin [1 ]
Winters, Patrick [1 ]
Butterfoss, Glenn L. [1 ]
Berstis, Viktors [2 ]
Uplinger, Keith [2 ]
Armstrong, Jonathan [2 ]
Riffle, Michael [3 ]
Schweighofer, Erik [4 ]
Bovermann, Bill [2 ]
Goodlett, David R. [5 ]
Davis, Trisha N. [3 ]
Shasha, Dennis [6 ]
Malmstroem, Lars [7 ]
Bonneau, Richard [1 ,4 ,6 ]
机构
[1] NYU, Dept Biol, Ctr Genom & Syst Biol, New York, NY 10003 USA
[2] IBM Corp, Austin, TX 78758 USA
[3] Univ Washington, Dept Genome Sci, Dept Biochem, Seattle, WA 98195 USA
[4] Inst Syst Biol, Seattle, WA 98103 USA
[5] Univ Washington, Dept Med Chem, Seattle, WA 98195 USA
[6] NYU, Courant Inst Math Sci, Dept Comp Sci, New York, NY 10003 USA
[7] ETH, Inst Mol Syst Biol, CH-8093 Zurich, Switzerland
关键词
GENE-FUNCTION PREDICTION; DEINOCOCCUS-RADIODURANS; PROTEINS; DATABASE; GENOME; SEQUENCE; MODELS; CLASSIFICATION; DIVERGENCE; NETWORK;
D O I
10.1101/gr.121475.111
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions.
引用
收藏
页码:1981 / 1994
页数:14
相关论文
共 61 条
[1]   Rumi is a CAP10 domain glycosyltransferase that modifies notch and is required for notch signaling [J].
Acar, Melih ;
Jafar-Nejad, Hamed ;
Takeuchi, Hideyuki ;
Rajan, Akhila ;
Ibrani, Dafina ;
Rana, Nadia A. ;
Pan, Hongling ;
Haltiwanger, Robert S. ;
Bellen, Hugo J. .
CELL, 2008, 132 (02) :247-258
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   Peptide-N-glycanases and DNA repair proteins, Xp-C/Rad4, are, respectively, active and inactivated enzymes sharing a common transglutaminase fold [J].
Anantharaman, V ;
Koonin, EV ;
Aravind, L .
HUMAN MOLECULAR GENETICS, 2001, 10 (16) :1627-1630
[4]  
[Anonymous], GRID
[5]  
[Anonymous], 2007, PLOS BIOL, DOI DOI 10.1371/journal.pbio.0050016
[6]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[7]   BioNetBuilder: automatic integration of biological networks [J].
Avila-Campillo, Iliana ;
Drew, Kevin ;
Lin, John ;
Reiss, David J. ;
Bonneau, Richard .
BIOINFORMATICS, 2007, 23 (03) :392-393
[8]   Analyzing yeast protein-protein interaction data obtained from different sources [J].
Bader, GD ;
Hogue, CWV .
NATURE BIOTECHNOLOGY, 2002, 20 (10) :991-997
[9]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[10]   Improved prediction of signal peptides: SignalP 3.0 [J].
Bendtsen, JD ;
Nielsen, H ;
von Heijne, G ;
Brunak, S .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 340 (04) :783-795