Sequence-Based Prediction of Protein-Peptide Binding Sites Using Support Vector Machine

被引：85

作者：

Taherzadeh, Ghazaleh ^{[1
]}

Yang, Yuedong ^{[1
,2
]}

Zhang, Tuo ^{[3
]}

Liew, Alan Wee-Chung ^{[1
]}

Zhou, Yaoqi ^{[1
,2
]}

机构：

[1] Griffith Univ, Sch Informat & Commun Technol, Parklands Dr, Southport, Qld 4215, Australia

[2] Griffith Univ, Inst Glycom, Parklands Dr, Southport, Qld 4215, Australia

[3] Weill Cornell Med Coll, 1300 York Ave, New York, NY 10065 USA

来源：

JOURNAL OF COMPUTATIONAL CHEMISTRY | 2016年 / 37卷 / 13期

基金：

澳大利亚研究理事会; 中国国家自然科学基金; 英国医学研究理事会;

关键词：

protein-peptide; binding site; sequence-based; prediction; features; machine learning; support vector machine; IDENTIFICATION; SURFACES; GENERATION; DATABASE;

D O I：

10.1002/jcc.24314

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Protein-peptide interactions are essential for all cellular processes including DNA repair, replication, gene-expression, and metabolism. As most protein-peptide interactions are uncharacterized, it is cost effective to investigate them computationally as the first step. All existing approaches for predicting protein-peptide binding sites, however, are based on protein structures despite the fact that the structures for most proteins are not yet solved. This article proposes the first machine-learning method called SPRINT to make Sequence-based prediction of Protein-peptide Residue-level Interactions. SPRINT yields a robust and consistent performance for 10-fold cross validations and independent test. The most important feature is evolution-generated sequence profiles. For the test set (1056 binding and non-binding residues), it yields a Matthews' Correlation Coefficient of 0.326 with a sensitivity of 64% and a specificity of 68%. This sequence-based technique shows comparable or more accurate than structure-based methods for peptide-binding site prediction. SPRINT is available as an online server at: http://sparks-lab.org/. (C) 2016 Wiley Periodicals, Inc.

引用

页码：1223 / 1229

页数：7

共 40 条

[1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].