STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time

被引:48
作者
Dalli, Deniz [1 ]
Wilm, Andreas [1 ]
Mainz, Indra [1 ]
Steger, Gerhard [1 ]
机构
[1] Univ Dusseldorf, Inst Phys Biol, D-40225 Dusseldorf, Germany
关键词
D O I
10.1093/bioinformatics/btl142
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Alignment of RNA has a wide range of applications, for example in phylogeny inference, consensus structure prediction and homology searches. Yet aligning structural or non-coding RNAs (ncRNAs) correctly is notoriously difficult as these RNA sequences may evolve by compensatory mutations, which maintain base pairing but destroy sequence homology. Ideally, alignment programs would take RNA structure into account. The Sankoff algorithm for the simultaneous solution of RNA structure prediction and RNA sequence alignment was proposed 20 years ago but suffers from its exponential complexity. A number of programs implement lightweight versions of the Sankoff algorithm by restricting its application to a limited type of structure and/or only pairwise alignment. Thus, despite recent advances, the proper alignment of multiple structural RNA sequences remains a problem. Results: Here we present StrAl, a heuristic method for alignment of ncRNA that reduces sequence-structure alignment to a two-dimensional problem similar to standard multiple sequence alignment. The scoring function takes into account sequence similarity as well as up- and downstream pairing probability. To test the robustness of the algorithm and the performance of the program, we scored alignments produced by StrAl against a large set of published reference alignments. The quality of alignments predicted by StrAl is far better than that obtained by standard sequence alignment programs, especially when sequence homologies drop below similar to 65%; nevertheless StrAl's runtime is comparable to that of ClustalW.
引用
收藏
页码:1593 / 1599
页数:7
相关论文
共 53 条
[31]   The prokaryotic selenoproteome [J].
Kryukov, GV ;
Gladyshev, VN .
EMBO REPORTS, 2004, 5 (05) :538-543
[32]   Recurrent structural RNA motifs, isostericity matrices and sequence alignments [J].
Lescoute, A ;
Leontis, NB ;
Massire, C ;
Westhof, E .
NUCLEIC ACIDS RESEARCH, 2005, 33 (08) :2395-2409
[33]   A hidden Markov model for progressive multiple alignment [J].
Löytynoja, A ;
Milinkovitch, MC .
BIOINFORMATICS, 2003, 19 (12) :1505-1513
[34]   ConStruct:: a tool for thermodynamic controlled prediction of conserved secondary structure [J].
Lück, R ;
Gräf, S ;
Steger, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (21) :4208-4217
[35]   Using information theory to search for co-evolving residues in proteins [J].
Martin, LC ;
Gloor, GB ;
Dunn, SD ;
Wahl, LM .
BIOINFORMATICS, 2005, 21 (22) :4116-4124
[36]   Dynalign: An algorithm for finding the secondary structure common to two RNA sequences [J].
Mathews, DH ;
Turner, DH .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 317 (02) :191-203
[37]   Predicting a set of minimal free energy RNA secondary structures common to two sequences [J].
Mathews, DH .
BIOINFORMATICS, 2005, 21 (10) :2246-2253
[38]   T-Coffee: A novel method for fast and accurate multiple sequence alignment [J].
Notredame, C ;
Higgins, DG ;
Heringa, J .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 302 (01) :205-217
[39]   A dynamic programming algorithm for RNA structure prediction including pseudoknots [J].
Rivas, E ;
Eddy, SR .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 285 (05) :2053-2068
[40]   THE NEIGHBOR-JOINING METHOD - A NEW METHOD FOR RECONSTRUCTING PHYLOGENETIC TREES [J].
SAITOU, N ;
NEI, M .
MOLECULAR BIOLOGY AND EVOLUTION, 1987, 4 (04) :406-425