Motif-based protein sequence classification using neural networks

被引:27
作者
Blekas, K [1 ]
Fotiadis, DI
Likas, A
机构
[1] Univ Ioannina, FORTH, Dept Comp Sci, GR-45110 Ioannina, Greece
[2] Univ Ioannina, FORTH, Biomed Res Inst, GR-45110 Ioannina, Greece
关键词
protein sequence classification; neural networks; probabilistic motifs; MEME algorithm; motif-based features;
D O I
10.1089/cmb.2005.12.64
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We present a system for multi-class protein classification based on neural networks. The basic issue concerning the construction of neural network systems for protein classification is the sequence encoding scheme that must be used in order to feed the neural network. To deal with this problem we propose a method that maps a protein sequence into a numerical feature space using the matching scores of the sequence to groups of conserved patterns (called motifs) into protein families. We consider two alternative ways for identifying the motifs to be used for feature generation and provide a comparative evaluation of the two schemes. We also evaluate the impact of the incorporation of background features (2-grams) on the performance of the neural system. Experimental results on real datasets indicate that the proposed method is highly efficient and is superior to other well-known methods for protein classification.
引用
收藏
页码:64 / 82
页数:19
相关论文
共 26 条
[1]   Universal sequence map (USM) of arbitrary discrete sequences [J].
Almeida, JS ;
Vinga, S .
BMC BIOINFORMATICS, 2002, 3 (1)
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]  
[Anonymous], 1978, ATLAS PROTEIN SEQUEN
[4]  
Bailey T L, 1994, Proc Int Conf Intell Syst Mol Biol, V2, P28
[5]   Combining evidence using p-values: application to sequence homology searches [J].
Bailey, TL ;
Gribskov, M .
BIOINFORMATICS, 1998, 14 (01) :48-54
[6]  
Bishop C. M., 1996, Neural networks for pattern recognition
[7]   Approaches to the automatic discovery of patterns in biosequences [J].
Brazma, A ;
Jonassen, I ;
Eidhammer, I ;
Gilbert, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (02) :279-305
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]  
Durbin R., 1998, Biological sequence analysis: Probabilistic models of proteins and nucleic acids
[10]  
Foresee FD, 1997, 1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, P1930, DOI 10.1109/ICNN.1997.614194