Support-vector-machine classification of linear functional motifs in proteins

被引:7
作者
Plewczynski, D
Tkacz, A
Wyrwicz, L
Godzik, A
Kloczkowski, A
Rychlewski, L
机构
[1] Univ Warsaw, Interdisciplinary Ctr Math & Computat Modeling, PL-02106 Warsaw, Poland
[2] BioInfoBank Inst, PL-60744 Poznan, Poland
[3] Adam Mickiewicz Univ Poznan, Bioinformat Unit, Dept Phys, PL-61614 Poznan, Poland
[4] Univ Calif San Diego, Bioinformat Core JCSG, La Jolla, CA 92093 USA
[5] Burnham Inst, La Jolla, CA 92037 USA
[6] Iowa State Univ, Baker Ctr Bioinformat & Biol Stat, Ames, IA USA
关键词
kinase substrate prediction; profile-profile sequence similarity; local structural segments; linear functional motifs; Swiss-Prot database; support vector machine (SVM);
D O I
10.1007/s00894-005-0070-2
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Our algorithm predicts short linear functional motifs in proteins using only sequence information. Statistical models for short linear functional motifs in proteins are built using the database of short sequence fragments taken from proteins in the current release of the Swiss-Prot database. Those segments are confirmed by experiments to have single-residue post-translational modification. The sensitivities of the classification for various types of short linear motifs are in the range of 70%. The query protein sequence is dissected into short overlapping fragments. All segments are represented as vectors. Each vector is then classified by a machine learning algorithm (Support Vector Machine) as potentially modifiable or not. The resulting list of plausible post-translational sites in the query protein is returned to the user. We also present a study of the human protein kinase C family as a biological application of our method.
引用
收藏
页码:453 / 461
页数:9
相关论文
共 30 条
[1]   PRINTS and its automatic supplement, prePRINTS [J].
Attwood, TK ;
Bradley, P ;
Flower, DR ;
Gaulton, A ;
Maudling, N ;
Mitchell, AL ;
Moulton, G ;
Nordle, A ;
Paine, K ;
Taylor, P ;
Uddin, A ;
Zygouri, C .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :400-402
[2]   The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :49-54
[3]   Sequence and structure-based prediction of eukaryotic protein phosphorylation sites [J].
Blom, N ;
Gammeltoft, S ;
Brunak, S .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 294 (05) :1351-1362
[4]  
Bystroff C, 2002, BIOINFORMATICS, V18, P54
[5]  
Cristianini N., 2000, SUPPORT VECTOR MACHI, DOI DOI 10.1017/CBO9780511801389
[6]   The PROSITE database, its status in 2002 [J].
Falquet, L ;
Pagni, M ;
Bucher, P ;
Hulo, N ;
Sigrist, CJA ;
Hofmann, K ;
Bairoch, A .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :235-238
[7]  
Gattiker Alexandre, 2002, Appl Bioinformatics, V1, P107
[8]   Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations [J].
Henikoff, S ;
Henikoff, JG ;
Pietrokovski, S .
BIOINFORMATICS, 1999, 15 (06) :471-479
[9]   The EMOTIF database [J].
Huang, JY ;
Brutlag, DL .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :202-204
[10]   FINDING FLEXIBLE PATTERNS IN UNALIGNED PROTEIN SEQUENCES [J].
JONASSEN, I ;
COLLINS, JF ;
HIGGINS, DG .
PROTEIN SCIENCE, 1995, 4 (08) :1587-1595