PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine

被引:0
作者
Yongchao Dou
Bo Yao
Chi Zhang
机构
[1] School of Biological Sciences,Center for Plant Science and Innovation
[2] University of Nebraska,undefined
来源
Amino Acids | 2014年 / 46卷
关键词
Phosphorylation site prediction; Non-kinase-specific tool; Support vector machine;
D O I
暂无
中图分类号
学科分类号
摘要
Phosphorylation is one of the most essential post-translational modifications in eukaryotes. Studies on kinases and their substrates are important for understanding cellular signaling networks. Because of the cost in time and labor associated with large-scale wet-bench experiments, computational prediction of phosphorylation sites becomes important and many computational tools have been developed in the recent decades. The prediction tools can be grouped into two categories: kinase-specific and non-kinase-specific tools. With more kinases being discovered by the new sequencing technologies, accurate non-kinase-specific prediction tools are highly desirable for whole-genome annotation in a wider variety of species. In this manuscript, a support vector machine is used to combine eight different sequence level scoring functions to predict phosphorylation sites. The attributes used by this work, including Shannon entropy, relative entropy, predicted protein secondary structure, predicted protein disorder, solvent accessible area, overlapping properties, averaged cumulative hydrophobicity, and k-nearest neighbor, were able to obtain better results than the previously used attributes by other similar methods. This method achieved AUC values of 0.8405/0.8183/0.7383 for serine (S), threonine (T), and tyrosine (Y) phosphorylation sites, respectively, in animals with a tenfold cross-validation. The model trained by the animal phosphorylation sites was also applied to a plant phosphorylation site dataset as an independent test. The AUC values for the independent test dataset were 0.7761/0.6652/0.5958 for S/T/Y phosphorylation sites, which compared favorably with those of several existing methods. A web server based on our method was constructed for public use. The server, trained model, and all datasets used in the current study are available at http://sysbio.unl.edu/PhosphoSVM.
引用
收藏
页码:1459 / 1469
页数:10
相关论文
共 214 条
[1]  
Ahmad S(2003)RVP-net: online prediction of real valued accessible surface area of proteins from single sequences Bioinformatics 19 1849-1851
[2]  
Gromiha MM(1997)Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res 25 3389-3402
[3]  
Sarai A(2010)AMS 3.0: prediction of post-translational modifications BMC Bioinforma 11 210-2216
[4]  
Altschul SF(2010)Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information BMC Bioinforma 11 273-1362
[5]  
Madden TL(1996)Cleavage site analysis in picornaviral polyproteins: discovering cellular targets by neural networks Protein Sci 5 2203-1632
[6]  
Schaffer AA(1999)Sequence and structure-based prediction of eukaryotic protein phosphorylation sites J Mol Biol 294 1351-11712
[7]  
Zhang J(2004)N-terminal myristoylation predictions by ensembles of neural networks Proteomics 4 1626-1882
[8]  
Zhang Z(2004)The mouse kinome: discovery and comparative genomics of all mouse protein kinases Proc Natl Acad Sci USA 101 11707-222
[9]  
Miller W(2007)Predicting functionally important residues from sequence conservation Bioinformatics 23 1875-845
[10]  
Lipman DJ(1974)Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins Biochemistry 13 211-D244