RNAmountAlign: Efficient software for local, global, semiglobal pairwise and multiple RNA sequence/structure alignment

被引:4
作者
Bayegan, Amir H. [1 ]
Clote, Peter [1 ]
机构
[1] Boston Coll, Biol Dept, Chestnut Hill, MA 02167 USA
基金
美国国家科学基金会;
关键词
STRUCTURAL ALIGNMENT; SECONDARY STRUCTURE; SEQUENCE ALIGNMENT; WEB SERVER; ALGORITHM; FAMILIES; FEATURES; SEARCH; GENOME; ENERGY;
D O I
10.1371/journal.pone.0227177
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Alignment of structural RNAs is an important problem with a wide range of applications. Since function is often determined by molecular structure, RNA alignment programs should take into account both sequence and base-pairing information for structural homology identification. This paper describes C++ software, RNAmountAlign, for RNA sequence/structure alignment that runs in O(n(3)) time and O(n(2)) space for two sequences of length n; moreover, our software returns a p-value (transformable to expect value E) based on Karlin-Altschul statistics for local alignment, as well as parameter fitting for local and global alignment. Using incremental mountain height, a representation of structural information computable in cubic time, RNAmountAlign implements quadratic time pairwise local, global and global/semiglobal (query search) alignment using a weighted combination of sequence and structural similarity. RNAmountAlign is capable of performing progressive multiple alignment as well. Benchmarking of RNAmountAlign against LocARNA, LARA, FOLDALIGN, DYNALIGN, STRAL, MXSCARNA, and MUSCLE shows that RNAmountAlign has reasonably good accuracy and faster run time supporting all alignment types. Additionally, our extension of RNAmountAlign, called RNAmountAlignScan, which scans a target genome sequence to find hits having high sequence and structural similarity to a given query sequence, outperforms RSEARCH and sequence-only query scans and runs faster than FOLDALIGN query scan.
引用
收藏
页数:34
相关论文
共 53 条
[31]   T-Coffee: A novel method for fast and accurate multiple sequence alignment [J].
Notredame, C ;
Higgins, DG ;
Heringa, J .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 302 (01) :205-217
[32]   Statistical distributions of optimal global alignment scores of random protein sequences [J].
Pang, HX ;
Tang, JW ;
Chen, SS ;
Tao, SH .
BMC BIOINFORMATICS, 2005, 6 (1)
[33]   THEORY AND COMPUTATION OF EVOLUTIONARY DISTANCES [J].
SELLERS, PH .
SIAM JOURNAL ON APPLIED MATHEMATICS, 1974, 26 (04) :787-793
[34]  
SHAPIRO BA, 1988, COMPUT APPL BIOSCI, V4, P387
[35]   Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega [J].
Sievers, Fabian ;
Wilm, Andreas ;
Dineen, David ;
Gibson, Toby J. ;
Karplus, Kevin ;
Li, Weizhong ;
Lopez, Rodrigo ;
McWilliam, Hamish ;
Remmert, Michael ;
Soeding, Johannes ;
Thompson, Julie D. ;
Higgins, Desmond G. .
MOLECULAR SYSTEMS BIOLOGY, 2011, 7
[36]   Freiburg RNA Tools: a web server integrating INTARNA, EXPARNA and LocARNA [J].
Smith, Cameron ;
Heyne, Steffen ;
Richter, Andreas S. ;
Will, Sebastian ;
Backofen, Rolf .
NUCLEIC ACIDS RESEARCH, 2010, 38 :W373-W377
[37]   DotAligner: identification and clustering of RNA structure motifs [J].
Smith, Martin A. ;
Seemann, Stefan E. ;
Quek, Xiu Cheng ;
Mattick, John S. .
GENOME BIOLOGY, 2017, 18
[38]  
Smith T.F., 1981, Adv. Appl. Math., V2, P482
[39]   IDENTIFICATION OF COMMON MOLECULAR SUBSEQUENCES [J].
SMITH, TF ;
WATERMAN, MS .
JOURNAL OF MOLECULAR BIOLOGY, 1981, 147 (01) :195-197
[40]  
Sneath PH, 1973, STAT METHOD EVAL SYS