Sequence-Based Prediction of Protein-Peptide Binding Sites Using Support Vector Machine

被引:85
作者
Taherzadeh, Ghazaleh [1 ]
Yang, Yuedong [1 ,2 ]
Zhang, Tuo [3 ]
Liew, Alan Wee-Chung [1 ]
Zhou, Yaoqi [1 ,2 ]
机构
[1] Griffith Univ, Sch Informat & Commun Technol, Parklands Dr, Southport, Qld 4215, Australia
[2] Griffith Univ, Inst Glycom, Parklands Dr, Southport, Qld 4215, Australia
[3] Weill Cornell Med Coll, 1300 York Ave, New York, NY 10065 USA
基金
澳大利亚研究理事会; 中国国家自然科学基金; 英国医学研究理事会;
关键词
protein-peptide; binding site; sequence-based; prediction; features; machine learning; support vector machine; IDENTIFICATION; SURFACES; GENERATION; DATABASE;
D O I
10.1002/jcc.24314
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Protein-peptide interactions are essential for all cellular processes including DNA repair, replication, gene-expression, and metabolism. As most protein-peptide interactions are uncharacterized, it is cost effective to investigate them computationally as the first step. All existing approaches for predicting protein-peptide binding sites, however, are based on protein structures despite the fact that the structures for most proteins are not yet solved. This article proposes the first machine-learning method called SPRINT to make Sequence-based prediction of Protein-peptide Residue-level Interactions. SPRINT yields a robust and consistent performance for 10-fold cross validations and independent test. The most important feature is evolution-generated sequence profiles. For the test set (1056 binding and non-binding residues), it yields a Matthews' Correlation Coefficient of 0.326 with a sensitivity of 64% and a specificity of 68%. This sequence-based technique shows comparable or more accurate than structure-based methods for peptide-binding site prediction. SPRINT is available as an online server at: http://sparks-lab.org/. (C) 2016 Wiley Periodicals, Inc.
引用
收藏
页码:1223 / 1229
页数:7
相关论文
共 40 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Announcing the worldwide Protein Data Bank [J].
Berman, H ;
Henrick, K ;
Nakamura, H .
NATURE STRUCTURAL BIOLOGY, 2003, 10 (12) :980-980
[3]   Identification of binding pockets in protein structures using a knowledge-based potential derived from local structural similarities [J].
Bianchi, Valerio ;
Gherardini, Pier Federico ;
Helmer-Citterich, Manuela ;
Ausiello, Gabriele .
BMC BIOINFORMATICS, 2012, 13
[4]   The MPI Bioinformatics toolkit for protein sequence analysis [J].
Biegert, Andreas ;
Mayer, Christian ;
Remmert, Michael ;
Soeding, Johannes ;
Lupas, Andrei N. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :W335-W339
[5]  
Bishop C.M., 2006, PATTERN RECOGN, V4, P738, DOI DOI 10.1117/1.2819119
[6]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[7]   Improved residue contact prediction using support vector machines and a large feature set [J].
Cheng, Jianlin ;
Baldi, Pierre .
BMC BIOINFORMATICS, 2007, 8 (1)
[8]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[9]   The HADDOCK web server for data-driven biomolecular docking [J].
De Vries, Sjoerd J. ;
van Dijk, Marc ;
Bonvin, Alexandre M. J. J. .
NATURE PROTOCOLS, 2010, 5 (05) :883-897
[10]   Understanding eukaryotic linear motifs and their role in cell signaling and regulation [J].
Diella, Francesca ;
Haslam, Niall ;
Chica, Claudia ;
Budd, Aidan ;
Michael, Sushama ;
Brown, Nigel P. ;
Trave, Gilles ;
Gibson, Toby J. .
FRONTIERS IN BIOSCIENCE-LANDMARK, 2008, 13 :6580-6603