A Threading-Based Method for the Prediction of DNA-Binding Proteins with Application to the Human Genome

被引:74
作者
Gao, Mu [1 ]
Skolnick, Jeffrey [1 ]
机构
[1] Georgia Inst Technol, Sch Biol, Ctr Study Syst Biol, Atlanta, GA 30332 USA
关键词
ORIGIN RECOGNITION COMPLEX; TO-AUTOINTEGRATION FACTOR; ZINC-FINGER DOMAINS; DROSOPHILA-MELANOGASTER; FUNCTIONAL ANNOTATION; STRUCTURE ALIGNMENT; ORC6; PROTEIN; SEQUENCE; DATABASE; CLASSIFICATION;
D O I
10.1371/journal.pcbi.1000567
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Diverse mechanisms for DNA-protein recognition have been elucidated in numerous atomic complex structures from various protein families. These structural data provide an invaluable knowledge base not only for understanding DNA-protein interactions, but also for developing specialized methods that predict the DNA-binding function from protein structure. While such methods are useful, a major limitation is that they require an experimental structure of the target as input. To overcome this obstacle, we develop a threading-based method, DNA-Binding-Domain-Threader (DBD-Threader), for the prediction of DNA-binding domains and associated DNA-binding protein residues. Our method, which uses a template library composed of DNA-protein complex structures, requires only the target protein's sequence. In our approach, fold similarity and DNA-binding propensity are employed as two functional discriminating properties. In benchmark tests on 179 DNA-binding and 3,797 non-DNA-binding proteins, using templates whose sequence identity is less than 30% to the target, DBD-Threader achieves a sensitivity/precision of 56%/86%. This performance is considerably better than the standard sequence comparison method PSI-BLAST and is comparable to DBD-Hunter, which requires an experimental structure as input. Moreover, for over 70% of predicted DNA-binding domains, the backbone Root Mean Square Deviations (RMSDs) of the top-ranked structural models are within 6.5 angstrom of their experimental structures, with their associated DNA-binding sites identified at satisfactory accuracy. Additionally, DBD-Threader correctly assigned the SCOP superfamily for most predicted domains. To demonstrate that DBD-Threader is useful for automatic function annotation on a large-scale, DBD-Threader was applied to 18,631 protein sequences from the human genome; 1,654 proteins are predicted to have DNA-binding function. Comparison with existing Gene Ontology (GO) annotations suggests that similar to 30% of our predictions are new. Finally, we present some interesting predictions in detail. In particular, it is estimated that similar to 20% of classic zinc finger domains play a functional role not related to direct DNA-binding.
引用
收藏
页数:15
相关论文
共 68 条
[1]   PSSM-based prediction of DNA binding sites in proteins [J].
Ahmad, S ;
Sarai, A .
BMC BIOINFORMATICS, 2005, 6 (1)
[2]   Moment-based prediction of DNA-binding proteins [J].
Ahmad, S ;
Sarai, A .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 341 (01) :65-71
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]   Role of the Orc6 protein in origin recognition complex-dependent DNA binding and replication in Drosophila melanogaster [J].
Balasov, Maxim ;
Huijbregts, Richard P. H. ;
Chesnokov, Igor .
MOLECULAR AND CELLULAR BIOLOGY, 2007, 27 (08) :3143-3153
[6]   Magnesium-induced assembly of a complete DNA polymerase catalytic complex [J].
Batra, VK ;
Beard, WA ;
Shock, DD ;
Krahn, JM ;
Pedersen, LC ;
Wilson, SH .
STRUCTURE, 2006, 14 (04) :757-766
[7]   The origin recognition complex: from simple origins to complex functions [J].
Bell, SP .
GENES & DEVELOPMENT, 2002, 16 (06) :659-672
[8]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[9]   Kernel-based machine learning protocol for predicting DNA-binding proteins [J].
Bhardwaj, N ;
Langlois, RE ;
Zhao, GJ ;
Lu, H .
NUCLEIC ACIDS RESEARCH, 2005, 33 (20) :6486-6493
[10]   Tandem DNA recognition by PhoB, a two-component signal transduction transcriptional activator [J].
Blanco, AG ;
Sola, M ;
Gomis-Rüth, FX ;
Coll, M .
STRUCTURE, 2002, 10 (05) :701-713