共 4 条
iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets
被引:143
|作者:
Jia, Jianhua
[1
,2
]
Liu, Zi
[1
]
Xiao, Xuan
[1
,2
]
Liu, Bingxiang
[1
]
Chou, Kuo-Chen
[2
,3
]
机构:
[1] Jing De Zhen Ceram Inst, Dept Comp, Jing De Zhen 333403, Peoples R China
[2] Gordon Life Sci Inst, Boston, MA 02478 USA
[3] King Abdulaziz Univ, Ctr Excellence Genom Med Res, Jeddah 21589, Saudi Arabia
来源:
MOLECULES
|
2016年
/
21卷
/
01期
关键词:
protein-protein binding sites;
physicochemical property;
stationary wavelet transform;
PseAAC;
Optimize training dataset;
KNNC;
IHTS;
target cross-validation;
AMINO-ACID-COMPOSITION;
LABEL LEARNING CLASSIFIER;
M2 PROTON CHANNEL;
SUBCELLULAR-LOCALIZATION;
WEB-SERVER;
K-TUPLE;
PHYSICOCHEMICAL PROPERTIES;
SECONDARY STRUCTURE;
STRUCTURAL CLASS;
GENERAL-FORM;
D O I:
10.3390/molecules21010095
中图分类号:
Q5 [生物化学];
Q7 [分子生物学];
学科分类号:
071010 ;
081704 ;
摘要:
Knowledge of protein-protein interactions and their binding sites is indispensable for in-depth understanding of the networks in living cells. With the avalanche of protein sequences generated in the postgenomic age, it is critical to develop computational methods for identifying in a timely fashion the protein-protein binding sites (PPBSs) based on the sequence information alone because the information obtained by this way can be used for both biomedical research and drug development. To address such a challenge, we have proposed a new predictor, called iPPBS-Opt, in which we have used: (1) the K-Nearest Neighbors Cleaning (KNNC) and Inserting Hypothetical Training Samples (IHTS) treatments to optimize the training dataset; (2) the ensemble voting approach to select the most relevant features; and (3) the stationary wavelet transform to formulate the statistical samples. Cross-validation tests by targeting the experiment-confirmed results have demonstrated that the new predictor is very promising, implying that the aforementioned practices are indeed very effective. Particularly, the approach of using the wavelets to express protein/peptide sequences might be the key in grasping the problem's essence, fully consistent with the findings that many important biological functions of proteins can be elucidated with their low-frequency internal motions. To maximize the convenience of most experimental scientists, we have provided a step-by-step guide on how to use the predictor's web server (http://www.jci-bioinfo.cn/iPPBS-Opt) to get the desired results without the need to go through the complicated mathematical equations involved.
引用
收藏
页数:19
相关论文