Evaluating the Accuracy and Efficiency of Multiple Sequence Alignment Methods

被引:34
作者
Pervez, Muhammad Tariq [1 ,2 ]
Babar, Masroor Ellahi [3 ]
Nadeem, Asif [2 ]
Aslam, Muhammad [4 ]
Awan, Ali Raza [2 ]
Aslam, Naeem [2 ,5 ]
Hussain, Tanveer [2 ]
Naveed, Nasir [6 ]
Qadri, Salman [7 ]
Waheed, Usman [1 ]
Shoaib, Muhammad [4 ]
机构
[1] Virtual Univ Pakistan, Dept Comp Sci, Lahore, Pakistan
[2] Univ Vet & Anim Sci, Inst Biochem & Biotechnol, Lahore, Pakistan
[3] Virtual Univ Pakistan, Dept Bioinformat, Lahore, Pakistan
[4] Univ Engn & Technol, Dept Comp Sci & Engn, Lahore, Pakistan
[5] NFC Inst Engn & Technol Training, Dept Comp Sci, Multan, Pakistan
[6] Univ Koblenz Landau, Landau, Germany
[7] Islamia Univ Bahawalpur, Dept Comp Sci, Bahawalpur, Pakistan
来源
EVOLUTIONARY BIOINFORMATICS | 2014年 / 10卷
关键词
Multiple Sequence Alignment Tools; comparative study of MSA tools; sum of pairs score; column score; evolutionary parameters; PROTEIN; PERFORMANCE; MUSCLE; MAFFT;
D O I
10.4137/EBO.S19199
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
A comparison of 10 most popular Multiple Sequence Alignment (MSA) tools, namely, MUSCLE, MAFFT(L-INS-i), MAFFT (FFT-NS-2), T-Coffee, ProbCons, SATe, Clustal Omega, Kalign, Multalin, and Dialign-TX is presented. We also focused on the significance of some implementations embedded in algorithm of each tool. Based on 10 simulated trees of different number of taxa generated by R, 400 known alignments and sequence files were constructed using indel-Seq-Gen. A total of 4000 test alignments were generated to study the effect of sequence length, indel size, deletion rate, and insertion rate. Results showed that alignment quality was highly dependent on the number of deletions and insertions in the sequences and that the sequence length and indel size had a weaker effect. Overall, ProbCons was consistently on the top of list of the evaluated MSA tools. SATe, being little less accurate, was 529.10% faster than ProbCons and 236.72% faster than MAFFT(L-INS-i). Among other tools, Kalign and MUSCLE achieved the highest sum of pairs. We also considered BALiBASE benchmark datasets and the results relative to BAliBASE- and indel-Seq-Gen-generated alignments were consistent in the most cases.
引用
收藏
页数:13
相关论文
共 41 条
[1]   The influenza virus resource at the national center for biotechnology information [J].
Bao, Yiming ;
Bolotov, Pavel ;
Dernovoy, Dmitry ;
Kiryutin, Boris ;
Zaslavsky, Leonid ;
Tatusova, Tatiana ;
Ostell, Jim ;
Lipman, David .
JOURNAL OF VIROLOGY, 2008, 82 (02) :596-601
[2]  
Catherine LA, 2011, BMC BIOINFORMATICS, V12, P184
[3]   MULTIPLE SEQUENCE ALIGNMENT WITH HIERARCHICAL-CLUSTERING [J].
CORPET, F .
NUCLEIC ACIDS RESEARCH, 1988, 16 (22) :10881-10890
[4]   A New Method for Handling Missing Species in Diversification Analysis Applicable to Randomly or Nonrandomly Sampled Phylogenies [J].
Cusimano, Natalie ;
Stadler, Tanja ;
Renner, Susanne S. .
SYSTEMATIC BIOLOGY, 2012, 61 (05) :785-792
[5]   ProbCons: Probabilistic consistency-based multiple sequence alignment [J].
Do, CB ;
Mahabhashyam, MSP ;
Brudno, M ;
Batzoglou, S .
GENOME RESEARCH, 2005, 15 (02) :330-340
[6]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[7]   DATING OF THE HUMAN APE SPLITTING BY A MOLECULAR CLOCK OF MITOCHONDRIAL-DNA [J].
HASEGAWA, M ;
KISHINO, H ;
YANO, TA .
JOURNAL OF MOLECULAR EVOLUTION, 1985, 22 (02) :160-174
[8]  
Iantorno S, 2014, METHODS MOL BIOL, V1079, P59, DOI 10.1007/978-1-62703-646-7_4
[9]  
Jukes TH., 1969, MAMMALIAN PROTEIN ME, P21, DOI [DOI 10.1016/B978-1-4832-3211-9.50009-7, DOI 10.1093/BIOINFORMATICS/BTM404, 10.1016/B978-1-4832-3211-9.50009-7]
[10]   Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set [J].
Karplus, K ;
Hu, BR .
BIOINFORMATICS, 2001, 17 (08) :713-720