HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy

被引:143
作者
Zou, Quan [1 ,2 ]
Hu, Qinghua [1 ,3 ]
Guo, Maozu [3 ]
Wang, Guohua [3 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Minist Educ, State Key Lab Syst Bioengn, Tianjin 300072, Peoples R China
[3] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150006, Peoples R China
关键词
16S RIBOSOMAL-RNA; FREQUENCY PROFILES; PREDICTION; ALGORITHM; SERVER;
D O I
10.1093/bioinformatics/btv177
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Multiple sequence alignment (MSA) is important work, but bottlenecks arise in the massive MSA of homologous DNA or genome sequences. Most of the available state-of-the-art software tools cannot address large-scale datasets, or they run rather slowly. The similarity of homologous DNA sequences is often ignored. Lack of parallelization is still a challenge for MSA research. Results: We developed two software tools to address the DNA MSA problem. The first employed trie trees to accelerate the centre star MSA strategy. The expected time complexity was decreased to linear time from square time. To address large-scale data, parallelism was applied using the hadoop platform. Experiments demonstrated the performance of our proposed methods, including their running time, sum-of-pairs scores and scalability. Moreover, we supplied two massive DNA/RNA MSA datasets for further testing and research.
引用
收藏
页码:2475 / 2481
页数:7
相关论文
共 34 条
[1]   Hobbes: optimized gram-based methods for efficient read alignment [J].
Ahmadi, Athena ;
Behm, Alexander ;
Honnalli, Nagesh ;
Li, Chen ;
Weng, Lingjie ;
Xie, Xiaohui .
NUCLEIC ACIDS RESEARCH, 2012, 40 (06) :e41
[2]   Influenza Virus Database (IVDB): an integrated information resource and analysis platform for influenza virus research [J].
Chang, Suhua ;
Zhang, Jiajie ;
Liao, Xiaoyun ;
Zhu, Xinxing ;
Wang, Dahai ;
Zhu, Jiang ;
Feng, Tao ;
Zhu, Baoli ;
Gao, George F. ;
Wang, Jian ;
Yang, Huanming ;
Yu, Jun ;
Wang, Jing .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D376-D380
[3]   AGP: A Multimethods Web Server for Alignment-Free Genome Phylogeny [J].
Cheng, Jinkui ;
Cao, Fuliang ;
Liu, Zhihua .
MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (05) :1032-1037
[4]   NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes [J].
DeSantis, T. Z. ;
Hugenholtz, P. ;
Keller, K. ;
Brodie, E. L. ;
Larsen, N. ;
Piceno, Y. M. ;
Phan, R. ;
Andersen, G. L. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :W394-W399
[5]   T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension [J].
Di Tommaso, Paolo ;
Moretti, Sebastien ;
Xenarios, Ioannis ;
Orobitg, Miquel ;
Montanyola, Alberto ;
Chang, Jia-Ming ;
Taly, Jean-Francois ;
Notredame, Cedric .
NUCLEIC ACIDS RESEARCH, 2011, 39 :W13-W17
[6]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[7]   Multiple sequence alignment [J].
Edgar, Robert C. ;
Batzoglou, Serafim .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2006, 16 (03) :368-373
[8]  
Finn, 2010, NUCLEIC ACIDS RES, V38, P211, DOI DOI 10.1093/NAR/GKP985
[9]   Rfam: Wikipedia, clans and the "decimal" release [J].
Gardner, Paul P. ;
Daub, Jennifer ;
Tate, John ;
Moore, Benjamin L. ;
Osuch, Isabelle H. ;
Griffiths-Jones, Sam ;
Finn, Robert D. ;
Nawrocki, Eric P. ;
Kolbe, Diana L. ;
Eddy, Sean R. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D141-D145
[10]   Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering [J].
Hao, Xiaolin ;
Jiang, Rui ;
Chen, Ting .
BIOINFORMATICS, 2011, 27 (05) :611-618