DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues

被引:32
|
作者
Ma, Xin [1 ]
Guo, Jing [2 ]
Sun, Xiao [2 ]
机构
[1] Nanjing Audit Univ, Sch Sci, Nanjing, Jiangsu, Peoples R China
[2] Southeast Univ, Sch Biol Sci & Med Engn, State Key Lab Bioelect, Nanjing, Jiangsu, Peoples R China
来源
PLOS ONE | 2016年 / 11卷 / 12期
基金
中国国家自然科学基金;
关键词
SUPPORT VECTOR MACHINES; SEQUENCE-BASED PREDICTION; AMINO-ACID-COMPOSITION; MRMR FEATURE-SELECTION; RIBOSOMAL-RNA-BINDING; WEB SERVER; INFORMATION; SVM; REDUNDANCY; RELEVANCE;
D O I
10.1371/journal.pone.0167345
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
DNA-binding proteins are fundamentally important in cellular processes. Several computational -based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and nonbinding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Sequence-based prediction of DNA-binding sites on DNA-binding proteins
    Gou, Z.
    Hwang, S.
    Kuznetsov, B., I
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE, VOL 1, 2006, : 268 - +
  • [22] Predicting Target DNA Sequences of DNA-Binding Proteins Based on Unbound Structures
    Chen, Chien-Yu
    Chien, Ting-Ying
    Lin, Chih-Kang
    Lin, Chih-Wei
    Weng, Yi-Zhong
    Chang, Darby Tien-Hao
    PLOS ONE, 2012, 7 (02):
  • [23] Sequence-based Detection of DNA-binding Proteins using Multiple-View Features Allied with Feature Selection
    Zhou, Liling
    Song, Xiaoning
    Yu, Dong-Jun
    Sun, Jun
    MOLECULAR INFORMATICS, 2020, 39 (08)
  • [24] Sequence-based predictor of ATP-binding residues using random forest and mRMR-IFS feature selection
    Ma, Xin
    Sun, Xiao
    JOURNAL OF THEORETICAL BIOLOGY, 2014, 360 : 59 - 66
  • [25] Shape string: A new feature for prediction of DNA-binding residues
    Wang, Duo-Duo
    Li, Tong-Hua
    Sun, Jiang-Ming
    Li, Da-Peng
    Xiong, Wen-Wei
    Wang, Wen-Yan
    Tang, Sheng-Nan
    BIOCHIMIE, 2013, 95 (02) : 354 - 358
  • [26] Predicting Functional Interactions Among DNA-Binding Proteins
    Khushi, Matloob
    Choudhury, Nazim
    Arthur, Jonathan W.
    Clarke, Christine L.
    Graham, J. Dinny
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 70 - 80
  • [27] iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model
    Lin, Wei-Zhong
    Fang, Jian-An
    Xiao, Xuan
    Chou, Kuo-Chen
    PLOS ONE, 2011, 6 (09):
  • [28] A Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen-Shannon Divergence
    Dang, Truong Khanh Linh
    Meckbach, Cornelia
    Tacke, Rebecca
    Waack, Stephan
    Gueltas, Mehmet
    ENTROPY, 2016, 18 (10)
  • [29] Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins
    Kuznetsov, Igor B.
    Gou, Zhenkun
    Li, Run
    Hwang, Seungwoo
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2006, 64 (01) : 19 - 27
  • [30] Identification of DNA-binding Proteins Using Gapped-dipeptide Composition and Recursive Feature Elimination Algorithm
    Tang Ya-Dong
    Liu Xiao
    Liu Tai-Gang
    Xie Lu
    Chen Lan-Ming
    PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS, 2018, 45 (04) : 453 - 459