iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets

被引:143
|
作者
Jia, Jianhua [1 ,2 ]
Liu, Zi [1 ]
Xiao, Xuan [1 ,2 ]
Liu, Bingxiang [1 ]
Chou, Kuo-Chen [2 ,3 ]
机构
[1] Jing De Zhen Ceram Inst, Dept Comp, Jing De Zhen 333403, Peoples R China
[2] Gordon Life Sci Inst, Boston, MA 02478 USA
[3] King Abdulaziz Univ, Ctr Excellence Genom Med Res, Jeddah 21589, Saudi Arabia
来源
MOLECULES | 2016年 / 21卷 / 01期
关键词
protein-protein binding sites; physicochemical property; stationary wavelet transform; PseAAC; Optimize training dataset; KNNC; IHTS; target cross-validation; AMINO-ACID-COMPOSITION; LABEL LEARNING CLASSIFIER; M2 PROTON CHANNEL; SUBCELLULAR-LOCALIZATION; WEB-SERVER; K-TUPLE; PHYSICOCHEMICAL PROPERTIES; SECONDARY STRUCTURE; STRUCTURAL CLASS; GENERAL-FORM;
D O I
10.3390/molecules21010095
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Knowledge of protein-protein interactions and their binding sites is indispensable for in-depth understanding of the networks in living cells. With the avalanche of protein sequences generated in the postgenomic age, it is critical to develop computational methods for identifying in a timely fashion the protein-protein binding sites (PPBSs) based on the sequence information alone because the information obtained by this way can be used for both biomedical research and drug development. To address such a challenge, we have proposed a new predictor, called iPPBS-Opt, in which we have used: (1) the K-Nearest Neighbors Cleaning (KNNC) and Inserting Hypothetical Training Samples (IHTS) treatments to optimize the training dataset; (2) the ensemble voting approach to select the most relevant features; and (3) the stationary wavelet transform to formulate the statistical samples. Cross-validation tests by targeting the experiment-confirmed results have demonstrated that the new predictor is very promising, implying that the aforementioned practices are indeed very effective. Particularly, the approach of using the wavelets to express protein/peptide sequences might be the key in grasping the problem's essence, fully consistent with the findings that many important biological functions of proteins can be elucidated with their low-frequency internal motions. To maximize the convenience of most experimental scientists, we have provided a step-by-step guide on how to use the predictor's web server (http://www.jci-bioinfo.cn/iPPBS-Opt) to get the desired results without the need to go through the complicated mathematical equations involved.
引用
收藏
页数:19
相关论文
共 4 条
  • [1] IBPred: A sequence-based predictor for identifying ion binding protein in phage
    Yuan, Shi-Shi
    Gao, Dong
    Xie, Xue-Qin
    Ma, Cai-Yi
    Su, Wei
    Zhang, Zhao-Yue
    Zheng, Yan
    Ding, Hui
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 4942 - 4951
  • [2] Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines
    Taherzadeh, Ghazaleh
    Zhou, Yaoqi
    Liew, Alan Wee-Chung
    Yang, Yuedong
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2016, 56 (10) : 2115 - 2122
  • [3] Sequence-based prediction of protein-binding sites in DNA: Comparative study of two SVM models
    Park, Byungkyu
    Im, Jinyong
    Tuvshinjargal, Narankhuu
    Lee, Wook
    Han, Kyungsook
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2014, 117 (02) : 158 - 167
  • [4] MPLs-Pred: Predicting Membrane Protein-Ligand Binding Sites Using Hybrid Sequence-Based Features and Ligand-Specific Models
    Lu, Chang
    Liu, Zhe
    Zhang, Enju
    He, Fei
    Ma, Zhiqiang
    Wang, Han
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2019, 20 (13)