Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information

被引:52
|
作者
Ma, Xin [1 ,2 ]
Guo, Jing [1 ]
Liu, Hong-De [1 ]
Xie, Jian-Ming [1 ]
Sun, Xiao [1 ]
机构
[1] Southeast Univ, State Key Lab Bioelect, Sch Biol Sci & Med Engn, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Audit Univ, Nanjing, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
DNA-binding residues; random forest; physicochemical property; evolutionary information; WEB SERVER; SITES; IDENTIFICATION; EVOLUTIONARY; PARAMETERS; DISCOVERY; TOOL;
D O I
10.1109/TCBB.2012.106
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The recognition of DNA-binding residues in proteins is critical to our understanding of the mechanisms of DNA-protein interactions, gene expression, and for guiding drug design. Therefore, a prediction method DNABR (DNA Binding Residues) is proposed for predicting DNA-binding residues in protein sequences using the random forest (RF) classifier with sequence-based features. Two types of novel sequence features are proposed in this study, which reflect the information about the conservation of physicochemical properties of the amino acids, and the correlation of amino acids between different sequence positions in terms of physicochemical properties. The first type of feature uses the evolutionary information combined with the conservation of physicochemical properties of the amino acids while the second reflects the dependency effect of amino acids with regards to polarity-charge and hydrophobic properties in the protein sequences. Those two features and an orthogonal binary vector which reflect the characteristics of 20 types of amino acids are used to build the DNABR, a model to predict DNA-binding residues in proteins. The DNABR model achieves a value of 0.6586 for Matthew's correlation coefficient (MCC) and 93.04 percent overall accuracy (ACC) with a 68.47 percent sensitivity (SE) and 98.16 percent specificity (SP), respectively. The comparisons with each feature demonstrate that these two novel features contribute most to the improvement in predictive ability. Furthermore, performance comparisons with other approaches clearly show that DNABR has an excellent prediction performance for detecting binding residues in putative DNA-binding protein. The DNABR web-server system is freely available at http://www.cbi.seu.edu.cn/DNABR/.
引用
收藏
页码:1766 / 1775
页数:10
相关论文
共 50 条
  • [1] Sequence-based prediction of DNA-binding sites on DNA-binding proteins
    Gou, Z.
    Hwang, S.
    Kuznetsov, B., I
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE, VOL 1, 2006, : 268 - +
  • [2] DP-Bind: a Web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins
    Hwang, Seungwoo
    Gou, Zhenkun
    Kuznetsov, Igor B.
    BIOINFORMATICS, 2007, 23 (05) : 634 - 636
  • [3] Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information
    Ahmad, S
    Gromiha, MM
    Sarai, A
    BIOINFORMATICS, 2004, 20 (04) : 477 - 486
  • [4] Structure based prediction of binding residues on DNA-binding proteins
    Bhardwaj, Nitin
    Langlois, Robert E.
    Hui, Guijun Zhao
    2005 27TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2005, : 2611 - 2614
  • [5] Prediction of DNA-binding residues from sequence
    Ofran, Yanay
    Mysore, Venkatesh
    Rost, Burkhard
    BIOINFORMATICS, 2007, 23 (13) : I347 - I353
  • [6] A sequence-based multiple kernel model for identifying DNA-binding proteins
    Yuqing Qian
    Limin Jiang
    Yijie Ding
    Jijun Tang
    Fei Guo
    BMC Bioinformatics, 22
  • [7] A sequence-based multiple kernel model for identifying DNA-binding proteins
    Qian, Yuqing
    Jiang, Limin
    Ding, Yijie
    Tang, Jijun
    Guo, Fei
    BMC BIOINFORMATICS, 2021, 22 (SUPPL 3)
  • [8] ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors
    Chu, Wen-Yi
    Huang, Yu-Feng
    Huang, Chun-Chin
    Cheng, Yi-Sheng
    Huang, Chien-Kang
    Oyang, Yen-Jen
    NUCLEIC ACIDS RESEARCH, 2009, 37 : W396 - W401
  • [9] Prediction of DNA-binding residues from protein sequence information using random forests
    Liangjiang Wang
    Mary Qu Yang
    Jack Y Yang
    BMC Genomics, 10
  • [10] Prediction of DNA-binding residues from sequence information using convolutional neural network
    Zhou, Jiyun
    Lu, Qin
    Xu, Ruifeng
    Gui, Lin
    Wang, Hongpeng
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2017, 17 (02) : 132 - 152