Glycosylation site prediction using ensembles of Support Vector Machine classifiers

被引:130
作者
Caragea, Cornelia [1 ,2 ]
Sinapov, Jivko [1 ,2 ]
Silvescu, Adrian [1 ,2 ]
Dobbs, Drena [3 ,4 ]
Honavar, Vasant [1 ,2 ]
机构
[1] Iowa State Univ, Dept Comp Sci, Artificial Intelligence Res Lab, Ames, IA 50011 USA
[2] Iowa State Univ, Ctr Comp Intelligence Learning & Discovery, Ames, IA 50011 USA
[3] Iowa State Univ, Dept Genet Dev & Cell Biol, Ames, IA 50011 USA
[4] Iowa State Univ, Bioinformat & Computat Biol, Ames, IA 50011 USA
关键词
D O I
10.1186/1471-2105-8-438
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Experimental identification of glycosylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences. Results: We explore machine learning methods for training classifiers to predict the amino acid residues that are likely to be glycosylated using information derived from the target amino acid residue and its sequence neighbors. We compare the performance of Support Vector Machine classifiers and ensembles of Support Vector Machine classifiers trained on a dataset of experimentally determined N-linked, O-linked, and C-linked glycosylation sites extracted from O-GlycBase version 6.00, a database of 242 proteins from several different species. The results of our experiments show that the ensembles of Support Vector Machine classifiers outperform single Support Vector Machine classifiers on the problem of predicting glycosylation sites in terms of a range of standard measures for comparing the performance of classifiers. The resulting methods have been implemented in EnsembleGly, a web server for glycosylation site prediction. Conclusion: Ensembles of Support Vector Machine classifiers offer an accurate and reliable approach to automated identification of putative glycosylation sites in glycoprotein sequences.
引用
收藏
页数:13
相关论文
共 40 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] [Anonymous], 1999, Advances in Large Margin Classifiers
  • [3] [Anonymous], 1997, Machine Learning
  • [4] THE SWISS-PROT PROTEIN-SEQUENCE DATA-BANK, RECENT DEVELOPMENTS
    BAIROCH, A
    BOECKMANN, B
    [J]. NUCLEIC ACIDS RESEARCH, 1993, 21 (13) : 3093 - 3096
  • [5] Assessing the accuracy of prediction algorithms for classification: an overview
    Baldi, P
    Brunak, S
    Chauvin, Y
    Andersen, CAF
    Nielsen, H
    [J]. BIOINFORMATICS, 2000, 16 (05) : 412 - 424
  • [6] Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence
    Blom, N
    Sicheritz-Pontén, T
    Gupta, R
    Gammeltoft, S
    Brunak, S
    [J]. PROTEOMICS, 2004, 4 (06) : 1633 - 1649
  • [7] A tutorial on Support Vector Machines for pattern recognition
    Burges, CJC
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) : 121 - 167
  • [8] Chawla N. V., 2006, DATA MIN KNOWL DISC, V5, P853
  • [9] Christlet THT, 2001, BIOPHYS J, V80, P952, DOI 10.1016/S0006-3495(01)76074-2
  • [10] Ensemble methods in machine learning
    Dietterich, TG
    [J]. MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 : 1 - 15