Detection and characterization of 3D-signature phosphorylation site motifs and their contribution towards improved phosphorylation site prediction in proteins

被引:58
作者
Durek, Pawel [1 ]
Schudoma, Christian [1 ]
Weckwerth, Wolfram [3 ]
Selbig, Joachim [1 ,2 ]
Walther, Dirk [1 ]
机构
[1] Max Planck Inst Mol Plant Physiol, D-14476 Potsdam, Germany
[2] Univ Potsdam, Inst Biochem & Biol, D-14476 Potsdam, Germany
[3] Univ Vienna, A-1090 Vienna, Austria
关键词
SUPPORT VECTOR MACHINES; MICROARRAY DATA; DOCKING INTERACTIONS; SIGNAL-TRANSDUCTION; SEQUENCE ALIGNMENT; KINASE; DATABASE; RECOGNITION; PATTERNS; CLASSIFICATION;
D O I
10.1186/1471-2105-10-117
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Phosphorylation of proteins plays a crucial role in the regulation and activation of metabolic and signaling pathways and constitutes an important target for pharmaceutical intervention. Central to the phosphorylation process is the recognition of specific target sites by protein kinases followed by the covalent attachment of phosphate groups to the amino acids serine, threonine, or tyrosine. The experimental identification as well as computational prediction of phosphorylation sites (P-sites) has proved to be a challenging problem. Computational methods have focused primarily on extracting predictive features from the local, one-dimensional sequence information surrounding phosphorylation sites. Results: We characterized the spatial context of phosphorylation sites and assessed its usability for improved phosphorylation site predictions. We identified 750 non-redundant, experimentally verified sites with three-dimensional (3D) structural information available in the protein data bank (PDB) and grouped them according to their respective kinase family. We studied the spatial distribution of amino acids around phosphorserines, phosphothreonines, and phosphotyrosines to extract signature 3D-profiles. Characteristic spatial distributions of amino acid residue types around phosphorylation sites were indeed discernable, especially when kinase-family-specific target sites were analyzed. To test the added value of using spatial information for the computational prediction of phosphorylation sites, Support Vector Machines were applied using both sequence as well as structural information. When compared to sequence-only based prediction methods, a small but consistent performance improvement was obtained when the prediction was informed by 3D-context information. Conclusion: While local one-dimensional amino acid sequence information was observed to harbor most of the discriminatory power, spatial context information was identified as relevant for the recognition of kinases and their cognate target sites and can be used for an improved prediction of phosphorylation sites. A web-based service (Phos3D) implementing the developed structure-based P-site prediction method has been made available at http://phos3d.mpimp-golm.mpg.de.
引用
收藏
页数:17
相关论文
共 55 条
[41]   Docking interactions in protein kinase and phosphatase networks [J].
Remenyi, Attila ;
Good, Matthew C. ;
Lim, Wendell A. .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2006, 16 (06) :676-685
[42]   Target specificity analysis of the Abl kinase using peptide microarray data [J].
Rychlewski, L ;
Kschischo, M ;
Dong, LY ;
Schutkowski, M ;
Reimer, U .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 336 (02) :307-311
[43]   DATABASE OF HOMOLOGY-DERIVED PROTEIN STRUCTURES AND THE STRUCTURAL MEANING OF SEQUENCE ALIGNMENT [J].
SANDER, C ;
SCHNEIDER, R .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1991, 9 (01) :56-68
[44]   SEQUENCE LOGOS - A NEW WAY TO DISPLAY CONSENSUS SEQUENCES [J].
SCHNEIDER, TD ;
STEPHENS, RM .
NUCLEIC ACIDS RESEARCH, 1990, 18 (20) :6097-6100
[45]   An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets [J].
Schwartz, D ;
Gygi, SP .
NATURE BIOTECHNOLOGY, 2005, 23 (11) :1391-1398
[46]   CLUSTAL-W - IMPROVING THE SENSITIVITY OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT THROUGH SEQUENCE WEIGHTING, POSITION-SPECIFIC GAP PENALTIES AND WEIGHT MATRIX CHOICE [J].
THOMPSON, JD ;
HIGGINS, DG ;
GIBSON, TJ .
NUCLEIC ACIDS RESEARCH, 1994, 22 (22) :4673-4680
[47]  
Vapnik V., 1999, NATURE STAT LEARNING
[48]  
Vert J P, 2002, Pac Symp Biocomput, P649
[49]   Gene selection from microarray data for cancer classification - a machine learning approach [J].
Wang, Y ;
Tetko, IV ;
Hall, MA ;
Frank, E ;
Facius, A ;
Mayer, KFX ;
Mewes, HW .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2005, 29 (01) :37-46
[50]   Scoring and identifying organism-specific functional patterns and putative phosphorylation sites in protein sequences using mutual information [J].
Weckwerth, W ;
Selbig, J .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2003, 307 (03) :516-521