Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information

被引:68
作者
Biswas, Ashis Kumer [1 ]
Noman, Nasimul [1 ]
Sikder, Abdur Rahman [2 ]
机构
[1] Univ Dhaka, Dept Comp Sci & Engn, Dhaka 1000, Bangladesh
[2] Univ Dhaka, Ctr Adv Res Chem Phys Biol & Pharmaceut Sci, Dhaka 1000, Bangladesh
关键词
PSI-BLAST; DATABASE SEARCHES; SEQUENCE; TURNS; TOOL;
D O I
10.1186/1471-2105-11-273
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Most of the existing in silico phosphorylation site prediction systems use machine learning approach that requires preparing a good set of classification data in order to build the classification knowledge. Furthermore, phosphorylation is catalyzed by kinase enzymes and hence the kinase information of the phosphorylated sites has been used as major classification data in most of the existing systems. Since the number of kinase annotations in protein sequences is far less than that of the proteins being sequenced to date, the prediction systems that use the information found from the small clique of kinase annotated proteins can not be considered as completely perfect for predicting outside the clique. Hence the systems are certainly not generalized. In this paper, a novel generalized prediction system, PPRED (Phosphorylation PREDictor) is proposed that ignores the kinase information and only uses the evolutionary information of proteins for classifying phosphorylation sites. Results: Experimental results based on cross validations and an independent benchmark reveal the significance of using the evolutionary information alone to classify phosphorylation sites from protein sequences. The prediction performance of the proposed system is better than those of the existing prediction systems that also do not incorporate kinase information. The system is also comparable to systems that incorporate kinase information in predicting such sites. Conclusions: The approach presented in this paper provides an efficient way to identify phosphorylation sites in a given protein primary sequence that would be a valuable information for the molecular biologists working on protein phosphorylation sites and for bioinformaticians developing generalized prediction systems for the post translational modifications like phosphorylation or glycosylation. PPRED is publicly available at the URL http://www.cse.univdhaka.edu/ashis/ppred/index.php.
引用
收藏
页数:17
相关论文
共 31 条
[1]   PSSM-based prediction of DNA binding sites in proteins [J].
Ahmad, S ;
Sarai, A .
BMC BIOINFORMATICS, 2005, 6 (1)
[2]   Protein database searches using compositionally adjusted substitution matrices [J].
Altschul, SF ;
Wootton, JC ;
Gertz, EM ;
Agarwala, R ;
Morgulis, A ;
Schäffer, AA ;
Yu, YK .
FEBS JOURNAL, 2005, 272 (20) :5101-5109
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]   Sequence and structure-based prediction of eukaryotic protein phosphorylation sites [J].
Blom, N ;
Gammeltoft, S ;
Brunak, S .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 294 (05) :1351-1362
[5]   Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence [J].
Blom, N ;
Sicheritz-Pontén, T ;
Gupta, R ;
Gammeltoft, S ;
Brunak, S .
PROTEOMICS, 2004, 4 (06) :1633-1649
[6]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[7]   The origins of protein phosphorylation [J].
Cohen, P .
NATURE CELL BIOLOGY, 2002, 4 (05) :E127-E130
[8]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[9]   Phospho.ELM:: A database of experimentally verified phosphorylation sites in eukaryotic proteins -: art. no. 79 [J].
Diella, F ;
Cameron, S ;
Gemünd, C ;
Linding, R ;
Via, A ;
Kuster, B ;
Sicheritz-Pontén, T ;
Blom, N ;
Gibson, TJ .
BMC BIOINFORMATICS, 2004, 5 (1)
[10]   Phospho.ELM: a database of phosphorylation sites - update 2008 [J].
Diella, Francesca ;
Gould, Cathryn M. ;
Chica, Claudia ;
Via, Allegra ;
Gibson, Toby J. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D240-D244