共 68 条
A Threading-Based Method for the Prediction of DNA-Binding Proteins with Application to the Human Genome
被引:74
作者:
Gao, Mu
[1
]
Skolnick, Jeffrey
[1
]
机构:
[1] Georgia Inst Technol, Sch Biol, Ctr Study Syst Biol, Atlanta, GA 30332 USA
关键词:
ORIGIN RECOGNITION COMPLEX;
TO-AUTOINTEGRATION FACTOR;
ZINC-FINGER DOMAINS;
DROSOPHILA-MELANOGASTER;
FUNCTIONAL ANNOTATION;
STRUCTURE ALIGNMENT;
ORC6;
PROTEIN;
SEQUENCE;
DATABASE;
CLASSIFICATION;
D O I:
10.1371/journal.pcbi.1000567
中图分类号:
Q5 [生物化学];
学科分类号:
071010 ;
081704 ;
摘要:
Diverse mechanisms for DNA-protein recognition have been elucidated in numerous atomic complex structures from various protein families. These structural data provide an invaluable knowledge base not only for understanding DNA-protein interactions, but also for developing specialized methods that predict the DNA-binding function from protein structure. While such methods are useful, a major limitation is that they require an experimental structure of the target as input. To overcome this obstacle, we develop a threading-based method, DNA-Binding-Domain-Threader (DBD-Threader), for the prediction of DNA-binding domains and associated DNA-binding protein residues. Our method, which uses a template library composed of DNA-protein complex structures, requires only the target protein's sequence. In our approach, fold similarity and DNA-binding propensity are employed as two functional discriminating properties. In benchmark tests on 179 DNA-binding and 3,797 non-DNA-binding proteins, using templates whose sequence identity is less than 30% to the target, DBD-Threader achieves a sensitivity/precision of 56%/86%. This performance is considerably better than the standard sequence comparison method PSI-BLAST and is comparable to DBD-Hunter, which requires an experimental structure as input. Moreover, for over 70% of predicted DNA-binding domains, the backbone Root Mean Square Deviations (RMSDs) of the top-ranked structural models are within 6.5 angstrom of their experimental structures, with their associated DNA-binding sites identified at satisfactory accuracy. Additionally, DBD-Threader correctly assigned the SCOP superfamily for most predicted domains. To demonstrate that DBD-Threader is useful for automatic function annotation on a large-scale, DBD-Threader was applied to 18,631 protein sequences from the human genome; 1,654 proteins are predicted to have DNA-binding function. Comparison with existing Gene Ontology (GO) annotations suggests that similar to 30% of our predictions are new. Finally, we present some interesting predictions in detail. In particular, it is estimated that similar to 20% of classic zinc finger domains play a functional role not related to direct DNA-binding.
引用
收藏
页数:15
相关论文