iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets

被引：143

作者：

Jia, Jianhua ^{[1
,2
]}

Liu, Zi ^{[1
]}

Xiao, Xuan ^{[1
,2
]}

Liu, Bingxiang ^{[1
]}

Chou, Kuo-Chen ^{[2
,3
]}

机构：

[1] Jing De Zhen Ceram Inst, Dept Comp, Jing De Zhen 333403, Peoples R China

[2] Gordon Life Sci Inst, Boston, MA 02478 USA

[3] King Abdulaziz Univ, Ctr Excellence Genom Med Res, Jeddah 21589, Saudi Arabia

来源：

MOLECULES | 2016年 / 21卷 / 01期

关键词：

protein-protein binding sites; physicochemical property; stationary wavelet transform; PseAAC; Optimize training dataset; KNNC; IHTS; target cross-validation; AMINO-ACID-COMPOSITION; LABEL LEARNING CLASSIFIER; M2 PROTON CHANNEL; SUBCELLULAR-LOCALIZATION; WEB-SERVER; K-TUPLE; PHYSICOCHEMICAL PROPERTIES; SECONDARY STRUCTURE; STRUCTURAL CLASS; GENERAL-FORM;

D O I：

10.3390/molecules21010095

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Knowledge of protein-protein interactions and their binding sites is indispensable for in-depth understanding of the networks in living cells. With the avalanche of protein sequences generated in the postgenomic age, it is critical to develop computational methods for identifying in a timely fashion the protein-protein binding sites (PPBSs) based on the sequence information alone because the information obtained by this way can be used for both biomedical research and drug development. To address such a challenge, we have proposed a new predictor, called iPPBS-Opt, in which we have used: (1) the K-Nearest Neighbors Cleaning (KNNC) and Inserting Hypothetical Training Samples (IHTS) treatments to optimize the training dataset; (2) the ensemble voting approach to select the most relevant features; and (3) the stationary wavelet transform to formulate the statistical samples. Cross-validation tests by targeting the experiment-confirmed results have demonstrated that the new predictor is very promising, implying that the aforementioned practices are indeed very effective. Particularly, the approach of using the wavelets to express protein/peptide sequences might be the key in grasping the problem's essence, fully consistent with the findings that many important biological functions of proteins can be elucidated with their low-frequency internal motions. To maximize the convenience of most experimental scientists, we have provided a step-by-step guide on how to use the predictor's web server (http://www.jci-bioinfo.cn/iPPBS-Opt) to get the desired results without the need to go through the complicated mathematical equations involved.

引用

页数：19

共 4 条

[1] IBPred: A sequence-based predictor for identifying ion binding protein in phage
Yuan, Shi-Shi
Gao, Dong
Xie, Xue-Qin
Ma, Cai-Yi
Su, Wei
Zhang, Zhao-Yue
Zheng, Yan
Ding, Hui
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 4942 - 4951
[2] Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines
Taherzadeh, Ghazaleh
Zhou, Yaoqi
Liew, Alan Wee-Chung
Yang, Yuedong
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2016, 56 (10) : 2115 - 2122
[3] Sequence-based prediction of protein-binding sites in DNA: Comparative study of two SVM models
Park, Byungkyu
Im, Jinyong
Tuvshinjargal, Narankhuu
Lee, Wook
Han, Kyungsook
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2014, 117 (02) : 158 - 167
[4] MPLs-Pred: Predicting Membrane Protein-Ligand Binding Sites Using Hybrid Sequence-Based Features and Ligand-Specific Models
Lu, Chang
Liu, Zhe
Zhang, Enju
He, Fei
Ma, Zhiqiang
Wang, Han
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2019, 20 (13)

← 1 →