Identifying Protein-Kinase-Specific Phosphorylation Sites Based on the Bagging-AdaBoost Ensemble Approach

被引:20
作者
Yu, Zhiwen [1 ]
Deng, Zhongkai [2 ]
Wong, Hau-San [2 ]
Tan, Lirong [2 ]
机构
[1] S China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510641, Peoples R China
[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
关键词
Adaptive boosting (AdaBoost); bagging; ensemble; kinase family; phosphorylation sites; prediction; KNOWLEDGE-BASED POTENTIALS; EUKARYOTIC PROTEINS; WIDE PREDICTION; CLASSIFIER; RESPECT; SEQUENCE;
D O I
10.1109/TNB.2010.2043682
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein phosphorylation is an important step in many biological processes, such as cell cycles, membrane transport, apoptosis, etc. In order to obtain more useful information about protein phosphorylation, it is necessary to develop a robust, stable, and accurate approach to predict phosphorylation sites. Although there exist a number of approaches to predict phosphorylation sites, such as those based on neural network and the support vector machine, they only use a single classifier. In general, the prediction results obtained by these approaches are not very stable and robust. In this paper, we design a new classifier ensemble approach called Bagging-AdaBoost ensemble (BAE) for the prediction of eukaryotic protein phosphorylation sites, which incorporates the bagging technique and the AdaBoost technique into the classifier framework to improve the accuracy, stability, and robustness of the final result. To our knowledge, this is the first time in which a combined bagging and boosting ensemble approach is applied to predict phosphorylation sites. Our prediction system based on BAE focuses on six kinase families: CDK, CK2, MAPK, PKA, PKC, and SRC. BAE achieves good performance in these six families, and the accuracies of the prediction system for these families are 0.8634, 0.8721, 0.8542, 0.8537, 0.8052, and 0.7432, respectively.
引用
收藏
页码:132 / 143
页数:12
相关论文
共 41 条
  • [1] Mass spectrometry-based proteomics
    Aebersold, R
    Mann, M
    [J]. NATURE, 2003, 422 (6928) : 198 - 207
  • [2] Sequence and structure-based prediction of eukaryotic protein phosphorylation sites
    Blom, N
    Gammeltoft, S
    Brunak, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1999, 294 (05) : 1351 - 1362
  • [3] Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence
    Blom, N
    Sicheritz-Pontén, T
    Gupta, R
    Gammeltoft, S
    Brunak, S
    [J]. PROTEOMICS, 2004, 4 (06) : 1633 - 1649
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [6] Phospho.ELM:: A database of experimentally verified phosphorylation sites in eukaryotic proteins -: art. no. 79
    Diella, F
    Cameron, S
    Gemünd, C
    Linding, R
    Via, A
    Kuster, B
    Sicheritz-Pontén, T
    Blom, N
    Gibson, TJ
    [J]. BMC BIOINFORMATICS, 2004, 5 (1)
  • [7] A decision-theoretic generalization of on-line learning and an application to boosting
    Freund, Y
    Schapire, RE
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) : 119 - 139
  • [8] Ho TK, 1998, IEEE T PATTERN ANAL, V20, P832, DOI 10.1109/34.709601
  • [9] KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites
    Huang, HD
    Lee, TY
    Tzeng, SW
    Horng, JT
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : W226 - W229
  • [10] Prediction of phosphorylation sites using SVMs
    Kim, JH
    Lee, J
    Oh, B
    Kimm, K
    Koh, IS
    [J]. BIOINFORMATICS, 2004, 20 (17) : 3179 - 3184