DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues

被引:32
|
作者
Ma, Xin [1 ]
Guo, Jing [2 ]
Sun, Xiao [2 ]
机构
[1] Nanjing Audit Univ, Sch Sci, Nanjing, Jiangsu, Peoples R China
[2] Southeast Univ, Sch Biol Sci & Med Engn, State Key Lab Bioelect, Nanjing, Jiangsu, Peoples R China
来源
PLOS ONE | 2016年 / 11卷 / 12期
基金
中国国家自然科学基金;
关键词
SUPPORT VECTOR MACHINES; SEQUENCE-BASED PREDICTION; AMINO-ACID-COMPOSITION; MRMR FEATURE-SELECTION; RIBOSOMAL-RNA-BINDING; WEB SERVER; INFORMATION; SVM; REDUNDANCY; RELEVANCE;
D O I
10.1371/journal.pone.0167345
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
DNA-binding proteins are fundamentally important in cellular processes. Several computational -based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and nonbinding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naive Bayes
    Lou, Wangchao
    Wang, Xiaoqing
    Chen, Fan
    Chen, Yixiao
    Jiang, Bo
    Zhang, Hua
    PLOS ONE, 2014, 9 (01):
  • [2] Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature
    Wu, Jiansheng
    Liu, Hongde
    Duan, Xueye
    Ding, Yan
    Wu, Hongtao
    Bai, Yunfei
    Sun, Xiao
    BIOINFORMATICS, 2009, 25 (01) : 30 - 35
  • [3] Structure based prediction of binding residues on DNA-binding proteins
    Bhardwaj, Nitin
    Langlois, Robert E.
    Hui, Guijun Zhao
    2005 27TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2005, : 2611 - 2614
  • [4] Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods
    Qu, Kaiyang
    Han, Ke
    Wu, Song
    Wang, Guohua
    Wei, Leyi
    MOLECULES, 2017, 22 (10):
  • [5] Predicting a DNA-binding protein using random forest with multiple mathematical features
    Guan, Changge
    Niu, Xiaohui
    Shi, Feng
    Yang, Kun
    Li, Nana
    BIO-MEDICAL MATERIALS AND ENGINEERING, 2015, 26 : S1883 - S1889
  • [6] Predicting DNA-binding Proteins Using Feature Fusion and MSVM-RFE
    Ji, Guoli
    Lin, Yang
    Lin, Qiamnin
    Huang, Guangzao
    Zhu, Wenbing
    You, Wenjie
    PROCEEDINGS OF 2016 10TH IEEE INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (ASID), 2016, : 109 - 112
  • [7] Predicting DNA-Binding Proteins and Binding Residues by Complex Structure Prediction and Application to Human Proteome
    Zhao, Huiying
    Wang, Jihua
    Zhou, Yaoqi
    Yang, Yuedong
    PLOS ONE, 2014, 9 (05):
  • [8] Improved Prediction of DNA-Binding Proteins Using Chaos Game Representation and Random Forest
    Niu, Xiaohui
    Hu, Xuehai
    CURRENT BIOINFORMATICS, 2016, 11 (02) : 156 - 163
  • [9] Identifying DNA-binding proteins based on multi-features and LASSO feature selection
    Zhang, Shengli
    Zhu, Fu
    Yu, Qianhao
    Zhu, Xiaoyue
    BIOPOLYMERS, 2021, 112 (02)
  • [10] KK-DBP: A Multi-Feature Fusion Method for DNA-Binding Protein Identification Based on Random Forest
    Jia, Yuran
    Huang, Shan
    Zhang, Tianjiao
    FRONTIERS IN GENETICS, 2021, 12