Motif-All: discovering all phosphorylation motifs

被引:32
作者
He, Zengyou [1 ]
Yang, Can [2 ]
Guo, Guangyu [3 ]
Li, Ning [3 ]
Yu, Weichuan [2 ]
机构
[1] Dalian Univ Technol, Sch Software, Dalian, Peoples R China
[2] Hong Kong Univ Sci & Technol, Lab Bioinformat & Computat Biol, Dept Elect & Comp Engn, Hong Kong, Hong Kong, Peoples R China
[3] Hong Kong Univ Sci & Technol, Dept Biol, Hong Kong, Hong Kong, Peoples R China
关键词
SITE;
D O I
10.1186/1471-2105-12-S1-S22
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Phosphorylation motifs represent common patterns around the phosphorylation site. The discovery of such kinds of motifs reveals the underlying regulation mechanism and facilitates the prediction of unknown phosphorylation event. To date, people have gathered large amounts of phosphorylation data, making it possible to perform substrate-driven motif discovery using data mining techniques. Results: We describe an algorithm called Motif-All that is able to efficiently identify all statistically significant motifs. The proposed method explores a support constraint to reduce search space and avoid generating random artifacts. As the number of phosphorylated peptides are far less than that of unphosphorylated ones, we divide the mining process into two stages: The first step generates candidates from the set of phosphorylated sequences using only support constraint and the second step tests the statistical significance of each candidate using the odds ratio derived from the whole data set. Experimental results on real data show that Motif-All outperforms current algorithms in terms of both effectiveness and efficiency. Conclusions: Motif-All is a useful tool for discovering statistically significant phosphorylation motifs. Source codes and data sets are available at: http://bioinformatics.ust.hk/MotifAll.rar.
引用
收藏
页数:8
相关论文
共 11 条
[1]  
Agrawal R., 1994, VLDB 1994, P487
[2]   A curated compendium of phosphorylation motifs [J].
Amanchy, Ramars ;
Periaswamy, Balamurugan ;
Mathivanan, Suresh ;
Reddy, Raghunath ;
Tattikota, Sudhir Gopal ;
Pandey, Akhilesh .
NATURE BIOTECHNOLOGY, 2007, 25 (03) :285-286
[3]   PhosPhAt: the Arabidopsis thaliana phosphorylation site database. An update [J].
Durek, Pawel ;
Schmidt, Robert ;
Heazlewood, Joshua L. ;
Jones, Alexandra ;
MacLean, Daniel ;
Nagel, Axel ;
Kersten, Birgit ;
Schulze, Waltraud X. .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D828-D834
[4]   PhosPhAt:: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor [J].
Heazlewood, Joshua L. ;
Durek, Pawel ;
Hummel, Jan ;
Selbig, Joachim ;
Weckwerth, Wolfram ;
Walther, Dirk ;
Schulze, Waltraud X. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D1015-D1021
[5]   PostMod: sequence based prediction of kinase-specific phosphorylation sites with indirect relationship [J].
Jung, Inkyung ;
Matsuyama, Akihisa ;
Yoshida, Minoru ;
Kim, Dongsup .
BMC BIOINFORMATICS, 2010, 11
[6]   Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm [J].
Rigoutsos, I ;
Floratos, A .
BIOINFORMATICS, 1998, 14 (01) :55-67
[7]   Discovery of phosphorylation motif mixtures in phosphoproteomics data [J].
Ritz, Anna ;
Shakhnarovich, Gregory ;
Salomon, Arthur R. ;
Raphael, Benjamin J. .
BIOINFORMATICS, 2009, 25 (01) :14-21
[8]   An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets [J].
Schwartz, D ;
Gygi, SP .
NATURE BIOTECHNOLOGY, 2005, 23 (11) :1391-1398
[9]   Predicting Protein Post-translational Modifications Using Meta-analysis of Proteome Scale Data Sets [J].
Schwartz, Daniel ;
Chou, Michael F. ;
Church, George M. .
MOLECULAR & CELLULAR PROTEOMICS, 2009, 8 (02) :365-379
[10]  
Wassertheil-Smoller S., 2004, Biostatistics and epidemiology: A primer for health and biomedical professionals