Predicting DNA-binding sites of proteins based on sequential and 3D structural information

被引:0
作者
Bi-Qing Li
Kai-Yan Feng
Juan Ding
Yu-Dong Cai
机构
[1] Shanghai University,Institute of Systems Biology
[2] Chinese Academy of Sciences,Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences
[3] Shanghai Center for Bioinformation Technology,Schepens Eye Research Institute
[4] Beijing Genomics Institute,undefined
[5] Shenzhen Beishan Industrial Zone,undefined
[6] Harvard Medical School,undefined
来源
Molecular Genetics and Genomics | 2014年 / 289卷
关键词
Protein-DNA interactions; Structural features; Random Forest (RF); Maximum relevance minimum redundancy (mRMR); Incremental feature selection (IFS);
D O I
暂无
中图分类号
学科分类号
摘要
Protein–DNA interactions play important roles in many biological processes. To understand the molecular mechanisms of protein–DNA interaction, it is necessary to identify the DNA-binding sites in DNA-binding proteins. In the last decade, computational approaches have been developed to predict protein–DNA-binding sites based solely on protein sequences. In this study, we developed a novel predictor based on support vector machine algorithm coupled with the maximum relevance minimum redundancy method followed by incremental feature selection. We incorporated not only features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure, solvent accessibility, but also five three-dimensional (3D) structural features calculated from PDB data to predict the protein–DNA interaction sites. Feature analysis showed that 3D structural features indeed contributed to the prediction of DNA-binding site and it was demonstrated that the prediction performance was better with 3D structural features than without them. It was also shown via analysis of features from each site that the features of DNA-binding site itself contribute the most to the prediction. Our prediction method may become a useful tool for identifying the DNA-binding sites and the feature analysis described in this paper may provide useful insights for in-depth investigations into the mechanisms of protein–DNA interaction.
引用
收藏
页码:489 / 499
页数:10
相关论文
共 193 条
  • [1] Ahmad S(2004)Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information Bioinformatics 20 477-486
  • [2] Gromiha MM(1997)Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res 25 3389-3402
  • [3] Sarai A(2005)Solving the protein sequence metric problem Proc Natl Acad Sci USA 102 6395-6400
  • [4] Altschul SF(2000)The protein data bank Nucleic Acids Res 28 235-242
  • [5] Madden TL(2004)Designing transcription factor architectures for drug discovery Mol Pharmacol 66 1361-1371
  • [6] Schaffer AA(2001)Predicting protein–protein interactions from primary structure Bioinformatics 17 455-460
  • [7] Zhang J(1989)The helix-turn-helix DNA binding motif J Biol Chem 264 1903-1906
  • [8] Zhang Z(2005)SCRATCH: a protein structure and structural feature prediction server Nucleic Acids Res 33 W72-W76
  • [9] Miller W(2001)Multi-class protein fold recognition using support vector machines and neural networks Bioinformatics 17 349-358
  • [10] Lipman DJ(2002)Intrinsic disorder and protein function Biochemistry 41 6573-6582