Simplifying amino acid alphabets using a genetic algorithm and sequence alignment

被引:0
作者
Lenckowski, Jacek [1 ]
Walczak, Krzysztof [1 ]
机构
[1] Warsaw Univ Technol, Inst Comp Sci, Ul Nowowiejska 15-19, PL-00665 Warsaw, Poland
来源
EVOLUTIONARY COMPUTATION, MACHINE LEARNING AND DATA MINING IN BIOINFORMATICS, PROCEEDINGS | 2007年 / 4447卷
关键词
amino acid alphabet; sequence alignment; substitution matrices; protein classification;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In some areas of bioinformatics (like protein folding or sequence alignment) the full alphabet of amino acid symbols is not necessary. Often, better results are received with simplified alphabets. In general, simplified alphabets are as universal as possible. In this paper we show that this concept may not be optimal. We present a genetic algorithm for alphabet simplifying and we use it in a method based on global sequence alignment. We demonstrate that our algorithm is much faster and produces better results than the previously presented genetic algorithm. We also compaxe alphabets constructed on the base of universal substitution matrices like BLOSUM with our alphabets built through sequence alignment and propose a new coefficient describing the value of alphabets in the sequence alignment context. Finally we show that our simplified alphabets give better results in a sequence classification (using k-NN classifier), than most previously presented simplified alphabets and better than full 20-letter alphabet.
引用
收藏
页码:122 / +
页数:3
相关论文
共 12 条
[1]  
ANDORF CM, 2002, P C COMP BIOL GEN IN
[2]   Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices [J].
Cannata, N ;
Toppo, S ;
Romualdi, C ;
Valle, G .
BIOINFORMATICS, 2002, 18 (08) :1102-1108
[3]   What is the minimum number of letters required to fold a protein? [J].
Fan, K ;
Wang, W .
JOURNAL OF MOLECULAR BIOLOGY, 2003, 328 (04) :921-926
[4]   AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS [J].
HENIKOFF, S ;
HENIKOFF, JG .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) :10915-10919
[5]  
JAAKKOLAY T, 1999, DISCRIMINATIVE FRAME
[6]  
LI T, 2003, MODERN PHYS LETT B, V17, P1
[7]  
Liu X, 2002, PHYS REV E, V66, DOI 10.1103/PhysRevE.66.021906
[8]   Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading [J].
Miyazawa, S ;
Jernigan, RL .
JOURNAL OF MOLECULAR BIOLOGY, 1996, 256 (03) :623-644
[9]   Simplified amino acid alphabets for protein fold recognition and implications for folding [J].
Murphy, LR ;
Wallqvist, A ;
Levy, RM .
PROTEIN ENGINEERING, 2000, 13 (03) :149-152
[10]  
PALENSKY M, 2003, COMP SOC BIOINF C CS