Two Hybrid Algorithms for Multiple Sequence Alignment

被引：0

作者：

Naznin, Farhana ^{[1
]}

Sarker, Ruhul ^{[1
]}

Essam, Daryl ^{[1
]}

机构：

[1] Univ New S Wales, Australian Def Force Acad, Canberra, ACT 2600, Australia

来源：

2009 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL MODELS FOR LIFE SCIENCES (CMLS '09) | 2010年 / 1210卷

关键词：

Progressive Alignment; Multiple Sequence Alignment (MSA); Dynamic Programming (DP); Guide-tree; Genetic Algorithm (GA); PHYLOGENETIC TREES; ACCURACY;

D O I：

10.1063/1.3314271

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

In order to design life saving drugs, such as cancer drugs, the design of Protein or DNA structures has to be accurate. These structures depend on Multiple Sequence Alignment (MSA). MSA is used to find the accurate structure of Protein and DNA sequences from existing approximately correct sequences. To overcome the overly greedy nature of the well known global progressive alignment method for multiple sequence alignment, we have proposed two different algorithms in this paper; one is using an iterative approach with a progressive alignment method (PAMIM) and the second one is using a genetic algorithm with a progressive alignment method (PAMGA). Both of our methods started with a "kmer" distance table to generate single guide-tree. In the iterative approach, we have introduced two new techniques: the first technique is to generate Guide-trees with randomly selected sequences and the second is of shuffling the sequences inside that tree. The output of the tree is a multiple sequence alignment which has been evaluated by the Sum of Pairs Method (SPM) considering the real value data from PAM250. In our second GA approach, these two techniques are used to generate an initial population and also two different approaches of genetic operators are implemented in crossovers and mutation. To test the performance of our two algorithms, we have compared these with the existing well known methods: T-Coffee, MUSCEL, MAFFT and Probcon, using BAliBase benchmarks. The experimental results show that the first algorithm works well for some situations, where other existing methods face difficulties in obtaining better solutions. The proposed second method works well compared to the existing methods for all situations and it shows better performance over the first one.

引用

页码：69 / 83

页数：15

共 14 条

[1] ProbCons: Probabilistic consistency-based multiple sequence alignment [J].

Do, CB ;

Mahabhashyam, MSP ;

Brudno, M ;

Batzoglou, S .

GENOME RESEARCH, 2005, 15 (02) :330-340

[2] MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].

Edgar, RC .

NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797

[3] PROGRESSIVE SEQUENCE ALIGNMENT AS A PREREQUISITE TO CORRECT PHYLOGENETIC TREES [J].

FENG, DF ;

DOOLITTLE, RF .

JOURNAL OF MOLECULAR EVOLUTION, 1987, 25 (04) :351-360

[4]

GOTOH O, 1993, COMPUT APPL BIOSCI, V9, P361

[5] MAFFT version 5: improvement in accuracy of multiple sequence alignment [J].

Katoh, K ;

Kuma, K ;

Toh, H ;

Miyata, T .

NUCLEIC ACIDS RESEARCH, 2005, 33 (02) :511-518

[6] DETECTING SUBTLE SEQUENCE SIGNALS - A GIBBS SAMPLING STRATEGY FOR MULTIPLE ALIGNMENT [J].

LAWRENCE, CE ;

ALTSCHUL, SF ;

BOGUSKI, MS ;

LIU, JS ;

NEUWALD, AF ;

WOOTTON, JC .

SCIENCE, 1993, 262 (5131) :208-214

[7] MULTIPLE ALIGNMENT USING SIMULATED ANNEALING - BRANCH POINT DEFINITION IN HUMAN MESSENGER-RNA SPLICING [J].

LUKASHIN, AV ;

ENGELBRECHT, J ;

BRUNAK, S .

NUCLEIC ACIDS RESEARCH, 1992, 20 (10) :2511-2516

[8] A GENERAL METHOD APPLICABLE TO SEARCH FOR SIMILARITIES IN AMINO ACID SEQUENCE OF 2 PROTEINS [J].

NEEDLEMAN, SB ;

WUNSCH, CD .

JOURNAL OF MOLECULAR BIOLOGY, 1970, 48 (03) :443-+

[9] SAGA: Sequence alignment by genetic algorithm [J].

Notredame, C ;

Higgins, DG .

NUCLEIC ACIDS RESEARCH, 1996, 24 (08) :1515-1524

[10] T-Coffee: A novel method for fast and accurate multiple sequence alignment [J].

Notredame, C ;

Higgins, DG ;

Heringa, J .

JOURNAL OF MOLECULAR BIOLOGY, 2000, 302 (01) :205-217

← 1 2 →