DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues

被引:32
|
作者
Ma, Xin [1 ]
Guo, Jing [2 ]
Sun, Xiao [2 ]
机构
[1] Nanjing Audit Univ, Sch Sci, Nanjing, Jiangsu, Peoples R China
[2] Southeast Univ, Sch Biol Sci & Med Engn, State Key Lab Bioelect, Nanjing, Jiangsu, Peoples R China
来源
PLOS ONE | 2016年 / 11卷 / 12期
基金
中国国家自然科学基金;
关键词
SUPPORT VECTOR MACHINES; SEQUENCE-BASED PREDICTION; AMINO-ACID-COMPOSITION; MRMR FEATURE-SELECTION; RIBOSOMAL-RNA-BINDING; WEB SERVER; INFORMATION; SVM; REDUNDANCY; RELEVANCE;
D O I
10.1371/journal.pone.0167345
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
DNA-binding proteins are fundamentally important in cellular processes. Several computational -based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and nonbinding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Affinity selection of DNA-binding proteins displayed on bacteriophage λ
    Zhang, Y
    Pak, JW
    Maruyama, IN
    Machida, M
    JOURNAL OF BIOCHEMISTRY, 2000, 127 (06): : 1057 - 1063
  • [32] Identification of DNA-binding Proteins Using Structural, Electrostatic and Evolutionary Features
    Nimrod, Guy
    Szilagyi, Andras
    Leslie, Christina
    Ben-Tal, Nir
    JOURNAL OF MOLECULAR BIOLOGY, 2009, 387 (04) : 1040 - 1053
  • [33] Identification of DNA-Binding and Protein-Binding Proteins Using Enhanced Graph Wavelet Features
    Zhu, Yuan
    Zhou, Weiqiang
    Dai, Dao-Qing
    Yan, Hong
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2013, 10 (04) : 1017 - 1031
  • [34] DNA-Prot: Identification of DNA Binding Proteins from Protein Sequence Information using Random Forest
    Kumar, K. Krishna
    Pugalenthi, Ganesan
    Suganthan, P. N.
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2009, 26 (06): : 679 - 686
  • [35] StackPDB: Predicting DNA-binding proteins based on XGB-RFE feature optimization and stacked ensemble classifier
    Zhang, Qingmei
    Liu, Peishun
    Wang, Xue
    Zhang, Yaqun
    Han, Yu
    Yu, Bin
    APPLIED SOFT COMPUTING, 2021, 99
  • [36] An accurate feature-based method for identifying DNA-binding residues on protein surfaces
    Xiong, Yi
    Liu, Juan
    Wei, Dong-Qing
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2011, 79 (02) : 509 - 517
  • [37] Identification and characterization of DNA-binding proteins by mass spectrometry
    Nordhoff, Eckhard
    Lehrach, Hans
    ANALYTICS OF PROTEIN-DNA INTERACTIONS, 2007, 104 : 111 - 195
  • [38] Identification of DNA-Binding Proteins via a Voting Strategy
    Zhang, Jun
    Liu, Bin
    CURRENT PROTEOMICS, 2018, 15 (05) : 363 - 373
  • [39] Identification of centromeric and telomeric DNA-binding proteins in rice
    He, Qi
    Chen, Lei
    Xu, Yu
    Yu, Weichang
    PROTEOMICS, 2013, 13 (05) : 826 - 832
  • [40] Rapid identification of DNA-binding proteins by mass spectrometry
    Nordhoff, E
    Krogsdam, AM
    Jorgensen, HF
    Kallipolitis, BH
    Clark, BFC
    Roepstorff, P
    Kristiansen, K
    NATURE BIOTECHNOLOGY, 1999, 17 (09) : 884 - 888