Prediction of glycosylation sites using random forests

被引:195
作者
Hamby, Stephen E. [1 ]
Hirst, Jonathan D. [1 ]
机构
[1] Univ Nottingham, Sch Chem, Nottingham NG7 2RD, England
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1186/1471-2105-9-500
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Post translational modifications (PTMs) occur in the vast majority of proteins and are essential for function. Prediction of the sequence location of PTMs enhances the functional characterisation of proteins. Glycosylation is one type of PTM, and is implicated in protein folding, transport and function. Results: We use the random forest algorithm and pairwise patterns to predict glycosylation sites. We identify pairwise patterns surrounding glycosylation sites and use an odds ratio to weight their propensity of association with modified residues. Our prediction program, GPP (glycosylation prediction program), predicts glycosylation sites with an accuracy of 90.8% for Ser sites, 92.0% for Thr sites and 92.8% for Asn sites. This is significantly better than current glycosylation predictors. We use the trepan algorithm to extract a set of comprehensible rules from GPP, which provide biological insight into all three major glycosylation types. Conclusion: We have created an accurate predictor of glycosylation sites and used this to extract comprehensible rules about the glycosylation process. GPP is available online at http://comp.chem.nottingham.ac.uk/glyco/.
引用
收藏
页数:13
相关论文
共 33 条
[1]   Accurate prediction of solvent accessibility using neural networks-based regression [J].
Adamczak, R ;
Porollo, A ;
Meller, J .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 56 (04) :753-767
[2]  
ARUN K, 2005, STRUCTURE BASED CHEM
[3]   Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information [J].
Bao, L ;
Cui, Y .
BIOINFORMATICS, 2005, 21 (10) :2185-2190
[4]   DEVELOPMENT OF HYDROPHOBICITY PARAMETERS TO ANALYZE PROTEINS WHICH BEAR POSTTRANSLATIONAL OR COTRANSLATIONAL MODIFICATIONS [J].
BLACK, SD ;
MOULD, DR .
ANALYTICAL BIOCHEMISTRY, 1991, 193 (01) :72-82
[5]   Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence [J].
Blom, N ;
Sicheritz-Pontén, T ;
Gupta, R ;
Gammeltoft, S ;
Brunak, S .
PROTEOMICS, 2004, 4 (06) :1633-1649
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Biological data mining with neural networks: implementation and application of a flexible decision tree extraction algorithm to genomic problem domains [J].
Browne, A ;
Hudson, BD ;
Whitley, DC ;
Ford, MG ;
Picton, P .
NEUROCOMPUTING, 2004, 57 (1-4) :275-293
[8]   Identifying SNPs predictive of phenotype using random forests [J].
Bureau, A ;
Dupuis, J ;
Falls, K ;
Lunetta, KL ;
Hayward, B ;
Keith, TP ;
Van Eerdewegh, P .
GENETIC EPIDEMIOLOGY, 2005, 28 (02) :171-182
[9]   Glycosylation site prediction using ensembles of Support Vector Machine classifiers [J].
Caragea, Cornelia ;
Sinapov, Jivko ;
Silvescu, Adrian ;
Dobbs, Drena ;
Honavar, Vasant .
BMC BIOINFORMATICS, 2007, 8 (1)
[10]   Prediction of protein-protein interactions using random decision forest framework [J].
Chen, XW ;
Liu, M .
BIOINFORMATICS, 2005, 21 (24) :4394-4400