Prediction of DNA-binding residues from sequence

被引:129
作者
Ofran, Yanay
Mysore, Venkatesh
Rost, Burkhard
机构
[1] Columbia Univ, Dept Biochem & Mol Biophys, New York, NY 10032 USA
[2] Columbia Univ, Columbia Univ Ctr Computat Biol & Bioinformat C2B, New York, NY 10032 USA
[3] Columbia Univ, DE Shaw Res, New York, NY 10032 USA
[4] Columbia Univ, NE Struct Genom Consortium NESG, New York, NY 10032 USA
关键词
D O I
10.1093/bioinformatics/btm174
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Thousands of proteins are known to bind to DNA; for most of them the mechanism of action and the residues that bind to DNA, i.e. the binding sites, are yet unknown. Experimental identification of binding sites requires expensive and laborious methods such as mutagenesis and binding essays. Hence, such studies are not applicable on a large scale. If the 3D structure of a protein is known, it is often possible to predict DNA-binding sites in silico. However, for most proteins, such knowledge is not available. Results: It has been shown that DNA-binding residues have distinct biophysical characteristics. Here we demonstrate that these characteristics are so distinct that they enable accurate prediction of the residues that bind DNA directly from amino acid sequence, without requiring any additional experimental or structural information. In a cross-validation based on the largest non-redundant dataset of high-resolution protein-DNA complexes available today, we found that 89% of our predictions are confirmed by experimental data. Thus, it is now possible to identify DNA-binding sites on a proteomic scale even in the absence of any experimental data or 3D-structural information.
引用
收藏
页码:I347 / I353
页数:7
相关论文
共 52 条
[1]   Moment-based prediction of DNA-binding proteins [J].
Ahmad, S ;
Sarai, A .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 341 (01) :65-71
[2]   Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information [J].
Ahmad, S ;
Gromiha, MM ;
Sarai, A .
BIOINFORMATICS, 2004, 20 (04) :477-486
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]   Analysis of sequence Specificities of DNA-binding proteins with protein binding microarrays [J].
Bulyk, Martha L. .
DNA MICROARRAYS PART A: ARRAY PLATFORMS AND WET-BENCH PROTOCOLS, 2006, 410 :279-+
[6]   Exploiting sequence and structure homologs to identify protein-protein binding sites [J].
Chung, JL ;
Wang, W ;
Bourne, PE .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2006, 62 (03) :630-640
[7]   Prediction of protein-protein interaction sites in heterocomplexes with neural networks [J].
Fariselli, P ;
Pazos, F ;
Valencia, A ;
Casadio, R .
EUROPEAN JOURNAL OF BIOCHEMISTRY, 2002, 269 (05) :1356-1361
[8]   Identification of protein-protein interaction sites from docking energy landscapes [J].
Fernández-Recio, J ;
Totrov, M ;
Abagyan, R .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 335 (03) :843-865
[9]   Finding families for genomic ORFans [J].
Fischer, D ;
Eisenberg, D .
BIOINFORMATICS, 1999, 15 (09) :759-762
[10]  
Joachims J., 1999, ADV KERNEL METHODS S