Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny

被引:366
作者
Edgar, Robert C.
机构
[1] Independent Researcher,
关键词
UNCERTAINTY; BENCHMARK; ALGORITHM; DATABASE; TREE;
D O I
10.1038/s41467-022-34630-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Multiple sequence alignments are widely used to predict protein structure, function, and phylogeny, but are uncertain with more diverged sequences. Muscle5 generates ensembles of alternative high-accurate alignments, enabling novel confidence estimates in alignments, trees, and other inferences. Multiple sequence alignments are widely used to infer evolutionary relationships, enabling inferences of structure, function, and phylogeny. Standard practice is to construct one alignment by some preferred method and use it in further analysis; however, undetected alignment bias can be problematic. I describe Muscle5, a novel algorithm which constructs an ensemble of high-accuracy alignment with diverse biases by perturbing a hidden Markov model and permuting its guide tree. Confidence in an inference is assessed as the fraction of the ensemble which supports it. Applied to phylogenetic tree estimation, I show that ensembles can confidently resolve topologies with low bootstrap according to standard methods, and conversely that some topologies with high bootstraps are incorrect. Applied to the phylogeny of RNA viruses, ensemble analysis shows that recently adopted taxonomic phyla are probably polyphyletic. Ensemble analysis can improve confidence assessment in any inference from an alignment.
引用
收藏
页数:9
相关论文
共 33 条
[1]   Ribovirus classification by a polymerase barcode sequence [J].
Babaian, Artem ;
Edgar, Robert .
PEERJ, 2022, 10
[2]   Incorporating alignment uncertainty into Felsenstein's phylogenetic bootstrap to improve its reliability [J].
Chang, Jia-Ming ;
Floden, Evan W. ;
Herrero, Javier ;
Gascuel, Olivier ;
Di Tommaso, Paolo ;
Notredame, Cedric .
BIOINFORMATICS, 2021, 37 (11) :1506-1514
[3]   Generalized Bootstrap Supports for Phylogenetic Analyses of Protein Sequences Incorporating Alignment Uncertainty [J].
Chatzou, Maria ;
Floden, Evan W. ;
Di Tommaso, Paolo ;
Gascuel, Olivier ;
Notredame, Cedric .
SYSTEMATIC BIOLOGY, 2018, 67 (06) :997-1009
[4]   Middle East Respiratory Syndrome Coronavirus (MERS-CoV): Announcement of the Coronavirus Study Group [J].
de Groot, Raoul J. ;
Baker, Susan C. ;
Baric, Ralph S. ;
Brown, Caroline S. ;
Drosten, Christian ;
Enjuanes, Luis ;
Fouchier, Ron A. M. ;
Galiano, Monica ;
Gorbalenya, Alexander E. ;
Memish, Ziad A. ;
Perlman, Stanley ;
Poon, Leo L. M. ;
Snijder, Eric J. ;
Stephens, Gwen M. ;
Woo, Patrick C. Y. ;
Zaki, Ali M. ;
Zambon, Maria ;
Ziebuhr, John .
JOURNAL OF VIROLOGY, 2013, 87 (14) :7790-7792
[5]   ProbCons: Probabilistic consistency-based multiple sequence alignment [J].
Do, CB ;
Mahabhashyam, MSP ;
Brudno, M ;
Batzoglou, S .
GENOME RESEARCH, 2005, 15 (02) :330-340
[6]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[7]   PROGRESSIVE SEQUENCE ALIGNMENT AS A PREREQUISITE TO CORRECT PHYLOGENETIC TREES [J].
FENG, DF ;
DOOLITTLE, RF .
JOURNAL OF MOLECULAR EVOLUTION, 1987, 25 (04) :351-360
[8]   Replication crisis or an opportunity to improve scientific production? [J].
Frias-Navarro, Dolores ;
Pascual-Llobell, Juan ;
Pascual-Soler, Marcos ;
Perezgonzalez, Jose ;
Berrios-Riquelme, Jose .
EUROPEAN JOURNAL OF EDUCATION, 2020, 55 (04) :618-631
[9]   A benchmark of multiple sequence alignment programs upon structural RNAs [J].
Gardner, PP ;
Wilm, A ;
Washietl, S .
NUCLEIC ACIDS RESEARCH, 2005, 33 (08) :2433-2439
[10]   The new scope of virus taxonomy: partitioning the virosphere into 15 hierarchical ranks [J].
Gorbalenya, Alexander E. ;
Krupovic, Mart ;
Mushegian, Arcady ;
Kropinski, Andrew M. ;
Siddell, Stuart G. ;
Varsani, Arvind ;
Adams, Michael J. ;
Davison, Andrew J. ;
Dutilh, Bas E. ;
Harrach, Balazs ;
Harrison, Robert L. ;
Junglen, Sandra ;
King, Andrew M. Q. ;
Knowles, Nick J. ;
Lefkowitz, Elliot J. ;
Nibert, Max L. ;
Rubino, Luisa ;
Sabanadzovic, Sead ;
Sanfacon, Helene ;
Simmonds, Peter ;
Walker, Peter J. ;
Zerbini, F. Murilo ;
Kuhn, Jens H. .
NATURE MICROBIOLOGY, 2020, 5 (05) :668-674