MTRAP: Pairwise sequence alignment algorithm by a new measure based on transition probability between two consecutive pairs of residues

被引:9
作者
Hara, Toshihide [1 ]
Sato, Keiko [1 ]
Ohya, Masanori [1 ]
机构
[1] Tokyo Univ Sci, Dept Informat Sci, Noda, Chiba 278, Japan
关键词
CLUSTAL-W; MULTIPLE; DATABASE; CONSISTENCY; ACCURACY; PROTEINS; HOMSTRAD; SEARCH; MATRIX; TOOL;
D O I
10.1186/1471-2105-11-235
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Sequence alignment is one of the most important techniques to analyze biological systems. It is also true that the alignment is not complete and we have to develop it to look for more accurate method. In particular, an alignment for homologous sequences with low sequence similarity is not in satisfactory level. Usual methods for aligning protein sequences in recent years use a measure empirically determined. As an example, a measure is usually defined by a combination of two quantities (1) and (2) below: (1) the sum of substitutions between two residue segments, (2) the sum of gap penalties in insertion/deletion region. Such a measure is determined on the assumption that there is no an intersite correlation on the sequences. In this paper, we improve the alignment by taking the correlation of consecutive residues. Results: We introduced a new method of alignment, called MTRAP by introducing a metric defined on compound systems of two sequences. In the benchmark tests by PREFAB 4.0 and HOMSTRAD, our pairwise alignment method gives higher accuracy than other methods such as ClustalW2, TCoffee, MAFFT. Especially for the sequences with sequence identity less than 15%, our method improves the alignment accuracy significantly. Moreover, we also showed that our algorithm works well together with a consistency-based progressive multiple alignment by modifying the TCoffee to use our measure. Conclusions: We indicated that our method leads to a significant increase in alignment accuracy compared with other methods. Our improvement is especially clear in low identity range of sequences. The source code is available at our web page, whose address is found in the section "Availability and requirements".
引用
收藏
页数:11
相关论文
共 30 条
[1]  
ALTSCHUL S, 2005, BIOINFORMATICS, V21, P1267
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   PRINCIPLES THAT GOVERN FOLDING OF PROTEIN CHAINS [J].
ANFINSEN, CB .
SCIENCE, 1973, 181 (4096) :223-230
[4]  
Blackshields Gordon, 2006, In Silico Biol, V6, P321
[5]   Pairwise alignment incorporating dipeptide covariation [J].
Crooks, GE ;
Green, RE ;
Brenner, SE .
BIOINFORMATICS, 2005, 21 (19) :3704-3710
[6]   ProbCons: Probabilistic consistency-based multiple sequence alignment [J].
Do, CB ;
Mahabhashyam, MSP ;
Brudno, M ;
Batzoglou, S .
GENOME RESEARCH, 2005, 15 (02) :330-340
[7]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[8]   PROGRESSIVE SEQUENCE ALIGNMENT AS A PREREQUISITE TO CORRECT PHYLOGENETIC TREES [J].
FENG, DF ;
DOOLITTLE, RF .
JOURNAL OF MOLECULAR EVOLUTION, 1987, 25 (04) :351-360
[9]   ANALYSIS OF AMINO-ACID SUBSTITUTION DURING DIVERGENT EVOLUTION - THE 400 BY 400 DIPEPTIDE SUBSTITUTION MATRIX [J].
GONNET, GH ;
COHEN, MA ;
BENNER, SA .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 1994, 199 (02) :489-496
[10]   CONSISTENCY OF OPTIMAL SEQUENCE ALIGNMENTS [J].
GOTOH, O .
BULLETIN OF MATHEMATICAL BIOLOGY, 1990, 52 (04) :509-525