Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets

被引:20
作者
Nute, Michael [1 ]
Saleh, Ehsan [2 ]
Warnow, Tandy [2 ,3 ,4 ]
机构
[1] Univ Illinois, Dept Stat, 725 S Wright St 101, Champaign, IL 61820 USA
[2] Univ Illinois, Dept Comp Sci, 201 N Goodwin Ave, Urbana, IL 61801 USA
[3] Univ Illinois, Carl R Woese Inst Genom Biol, 1205 W Clark St, Urbana, IL 61801 USA
[4] Univ Illinois, Natl Ctr Supercomp Applicat, Urbana, IL 61801 USA
基金
美国国家科学基金会;
关键词
BAli-Phy; homology; multiple sequence alignment; protein sequences; structural alignment; MAXIMUM-LIKELIHOOD ALIGNMENT; ALGORITHM; ERRORS; MODEL; BENCHMARKING; NUCLEOTIDE; EVOLUTION; INFERENCE; ACCURACY; IMPACT;
D O I
10.1093/sysbio/syy068
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The estimation of multiple sequence alignments of protein sequences is a basic step in many bioinformatics pipelines, including alignments and trees under stochastic models of sequence evolution has long been considered the most rigorous technique for estimating alignments and trees, but little is known about the accuracy of suchmethods on biological benchmarks. We report the results of an extensive study evaluating the most popular protein alignment methods as well as the statistical coestimation method BAli-Phy on 1192 protein data sets from established benchmarks as well as on 120 simulated data sets. Our study (which used more than 230 CPU years for the BAli-Phy analyses alone) shows that BAli-Phy has better precision and recall (with respect to the true alignments) than the other alignment methods on the simulated data sets but has consistently lower recall on the biological benchmarks (with respect to the reference alignments) than many of the other methods. In otherwords, we find that BAli-Phy systematically underaligns when operating on biological sequence data but shows no sign of this on simulated data. There are several potential causes for this change in performance, including model misspecification, errors in the reference alignments, and conflicts between structural alignment and evolutionary alignments, and future research is needed to determine the most likely explanation. We conclude with a discussion of the potential ramifications for each of these possibilities.
引用
收藏
页码:396 / 411
页数:16
相关论文
共 93 条
[1]   ResBoost: characterizing and predicting catalytic residues in enzymes [J].
Alterovitz, Ron ;
Arvey, Aaron ;
Sankararaman, Sriram ;
Dallett, Carolina ;
Freund, Yoav ;
Sjoelander, Kimmen .
BMC BIOINFORMATICS, 2009, 10
[2]   SISYPHUS - structural alignments for proteins with non-trivial relationships [J].
Andreeva, Antonina ;
Prlic, Andreas ;
Hubbard, Tim J. P. ;
Murzin, Alexey G. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D253-D259
[3]   Issues in bioinformatics benchmarking: the case study of multiple sequence alignment [J].
Aniba, Mohamed Radhouene ;
Poch, Olivier ;
Thompson, Julie D. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (21) :7353-7363
[4]   Protein evolution along phylogenetic histories under structurally constrained substitution models [J].
Arenas, Miguel ;
Dos Santos, Helena G. ;
Posada, David ;
Bastolla, Ugo .
BIOINFORMATICS, 2013, 29 (23) :3020-3028
[5]   BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations [J].
Bahr, A ;
Thompson, JD ;
Thierry, JC ;
Poch, O .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :323-326
[6]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[7]   Improvement in Protein Domain Identification Is Reached by Breaking Consensus, with the Agreement of Many Profiles and Domain Co-occurrence [J].
Bernardes, Juliana ;
Zaverucha, Gerson ;
Vaquero, Catherine ;
Carbone, Alessandra .
PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (07)
[8]   MAXIMUM-LIKELIHOOD ALIGNMENT OF DNA-SEQUENCES [J].
BISHOP, MJ ;
THOMPSON, EA .
JOURNAL OF MOLECULAR BIOLOGY, 1986, 190 (02) :159-165
[9]   Class of Multiple Sequence Alignment Algorithm Affects Genomic Analysis [J].
Blackburne, Benjamin P. ;
Whelan, Simon .
MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (03) :642-653
[10]  
Blackshields Gordon, 2006, In Silico Biol, V6, P321