Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS

被引:90
|
作者
Li, Bi-Qing [2 ,3 ]
Feng, Kai-Yan [4 ]
Chen, Lei [5 ]
Huang, Tao [2 ,3 ,6 ]
Cai, Yu-Dong [1 ]
机构
[1] Shanghai Univ, Inst Syst Biol, Shanghai, Peoples R China
[2] Chinese Acad Sci, Shanghai Inst Biol Sci, Key Lab Syst Biol, Shanghai, Peoples R China
[3] Shanghai Ctr Bioinformat Technol, Shanghai, Peoples R China
[4] Beijing Genom Inst, Shenzhen, Peoples R China
[5] Shanghai Maritime Univ, Coll Informat Engn, Shanghai, Peoples R China
[6] Mt Sinai Sch Med, Dept Genet & Genom Sci, New York, NY USA
来源
PLOS ONE | 2012年 / 7卷 / 08期
关键词
SECONDARY-STRUCTURE; SEQUENCE PROFILE; HOT-SPOTS; CLASSIFICATION; INTERFACES; PROGRAM; RESIDUE; IDENTIFICATION; INFORMATION; DATABASE;
D O I
10.1371/journal.pone.0043927
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Prediction of Protein-Protein Interaction Sites by Multifeature Fusion and RF with mRMR and IFS
    Zhang, JunYan
    Lyu, Yinghua
    Ma, Zhiqiang
    DISEASE MARKERS, 2022, 2022
  • [2] Protein-Protein Interaction Sites Prediction Based on an Under-Sampling Strategy and Random Forest Algorithm
    Li, Minjie
    Wu, Ziheng
    Wang, Wenyan
    Lu, Kun
    Zhang, Jun
    Zhou, Yuming
    Chen, Zhaoquan
    Li, Dan
    Zheng, Shicheng
    Chen, Peng
    Wang, Bing
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (06) : 3646 - 3654
  • [3] A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites
    Wei, Zhi-Sen
    Yang, Jing-Yu
    Shen, Hong-Bin
    Yu, Dong-Jun
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2015, 14 (07) : 746 - 760
  • [4] Applying the Naive Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites
    Murakami, Yoichi
    Mizuguchi, Kenji
    BIOINFORMATICS, 2010, 26 (15) : 1841 - 1848
  • [5] Prediction of protein-protein interaction sites using an ensemble method
    Deng, Lei
    Guan, Jihong
    Dong, Qiwen
    Zhou, Shuigeng
    BMC BIOINFORMATICS, 2009, 10
  • [6] Identifying Protein-Protein Interaction Sites Using Covering Algorithm
    Du, Xiuquan
    Cheng, Jiaxing
    Song, Jie
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2009, 10 (05): : 2190 - 2202
  • [7] CONDITIONAL RANDOM FIELD BASED ALGORITHM FOR PROTEIN-PROTEIN INTERACTION PREDICTION
    Liu, Wei
    Chen, Ling
    Li, Bin
    OXIDATION COMMUNICATIONS, 2016, 39 (2A): : 1896 - 1906
  • [8] Progress and challenges in predicting protein-protein interaction sites
    Ezkurdia, Lakes
    Bartoli, Lisa
    Fariselli, Piero
    Casadio, Rita
    Valencia, Alfonso
    Tress, Michael L.
    BRIEFINGS IN BIOINFORMATICS, 2009, 10 (03) : 233 - 246
  • [9] Protein-protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique
    Wang, Xiaoying
    Yu, Bin
    Ma, Anjun
    Chen, Cheng
    Liu, Bingqiang
    Ma, Qin
    BIOINFORMATICS, 2019, 35 (14) : 2395 - 2402
  • [10] Protein-protein interaction site prediction using random forest proximity distance
    Qiu, Zhijun
    Liu, Qingjie
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2021, 19 (01)