Specific alignment of structured RNA: stochastic grammars and sequence annealing

被引:23
作者
Bradley, Robert K. [2 ]
Pachter, Lior [1 ]
Holmes, Ian [2 ,3 ]
机构
[1] Univ Calif Berkeley, Dept Math, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Biophys Grad Grp, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Dept Bioengn, Berkeley, CA 94720 USA
关键词
D O I
10.1093/bioinformatics/btn495
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Whole-genome screens suggest that eukaryotic genomes are dense with non-coding RNAs (ncRNAs). We introduce a novel approach to RNA multiple alignment which couples a generative probabilistic model of sequence and structure with an efficient sequence annealing approach for exploring the space of multiple alignments. This leads to a new software program, Stemloc-AMA, that is both accurate and specific in the alignment of multiple related RNA sequences. Results: When tested on the benchmark datasets BRalibase II and BRalibase 2.1, Stemloc-AMA has comparable sensitivity to and better specificity than the best competing methods. We use a large-scale random sequence experiment to show that while most alignment programs maximize sensitivity at the expense of specificity, even to the point of giving complete alignments of non-homologous sequences, Stemloc-AMA aligns only sequences with detectable homology and leaves unrelated sequences largely unaligned. Such accurate and specific alignments are crucial for comparative-genomics analysis, from inferring phylogeny to estimating substitution rates across different lineages.
引用
收藏
页码:2677 / 2683
页数:7
相关论文
共 35 条
[1]   Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints [J].
D Dowell, Robin ;
Eddy, Sean R. .
BMC BIOINFORMATICS, 2006, 7 (1)
[2]   CONTRAfold: RNA secondary structure prediction without physics-based models [J].
Do, Chuong B. ;
Woods, Daniel A. ;
Batzoglou, Serafim .
BIOINFORMATICS, 2006, 22 (14) :E90-E98
[3]   Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction [J].
Dowell, RD ;
Eddy, SR .
BMC BIOINFORMATICS, 2004, 5 (1)
[4]  
Durbin R., 1998, Biological sequence analysis: Probabilistic models of proteins and nucleic acids
[5]   PROGRESSIVE SEQUENCE ALIGNMENT AS A PREREQUISITE TO CORRECT PHYLOGENETIC TREES [J].
FENG, DF ;
DOOLITTLE, RF .
JOURNAL OF MOLECULAR EVOLUTION, 1987, 25 (04) :351-360
[6]   A benchmark of multiple sequence alignment programs upon structural RNAs [J].
Gardner, PP ;
Wilm, A ;
Washietl, S .
NUCLEIC ACIDS RESEARCH, 2005, 33 (08) :2433-2439
[7]   Finding the most significant common sequence and structure motifs in a set of RNA sequences [J].
Gorodkin, J ;
Heyer, LJ ;
Stormo, GD .
NUCLEIC ACIDS RESEARCH, 1997, 25 (18) :3724-3732
[8]   Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments [J].
Gotoh, O .
JOURNAL OF MOLECULAR BIOLOGY, 1996, 264 (04) :823-838
[9]   Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix [J].
Havgaard, Jakob H. ;
Torarinsson, Elfar ;
Gorodkin, Jan .
PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (10) :1896-1908
[10]   Pairwise local structural alignment of RNA sequences with sequence similarity less than 40% [J].
Havgaard, JH ;
Lyngso, RB ;
Stormo, GD ;
Gorodkin, J .
BIOINFORMATICS, 2005, 21 (09) :1815-1824