ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference

被引:317
作者
Steenwyk, Jacob L. [1 ]
Buida, Thomas J., III
Li, Yuanning [1 ]
Shen, Xing-Xing [2 ]
Rokas, Antonis [1 ]
机构
[1] Vanderbilt Univ, Dept Biol Sci, 221 Kirkland Hall, Nashville, TN 37235 USA
[2] Zhejiang Univ, Key Lab Mol Biol Crop Pathogens & Insects, Minist Agr, Inst Insect Sci, Hangzhou, Peoples R China
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
TREE; COALESCENT; PLACEMENT; SISTER; SITES; RATES; TOOL;
D O I
10.1371/journal.pbio.3001007
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Highly divergent sites in multiple sequence alignments (MSAs), which can stem from erroneous inference of homology and saturation of substitutions, are thought to negatively impact phylogenetic inference. Thus, several different trimming strategies have been developed for identifying and removing these sites prior to phylogenetic inference. However, a recent study reported that doing so can worsen inference, underscoring the need for alternative alignment trimming strategies. Here, we introduce ClipKIT, an alignment trimming software that, rather than identifying and removing putatively phylogenetically uninformative sites, instead aims to identify and retain parsimony-informative sites, which are known to be phylogenetically informative. To test the efficacy of ClipKIT, we examined the accuracy and support of phylogenies inferred from 14 different alignment trimming strategies, including those implemented in ClipKIT, across nearly 140,000 alignments from a broad sampling of evolutionary histories. Phylogenies inferred from ClipKIT-trimmed alignments are accurate, robust, and time saving. Furthermore, ClipKIT consistently outperformed other trimming methods across diverse datasets, suggesting that strategies based on identifying and retaining parsimony-informative sites provide a robust framework for alignment trimming.
引用
收藏
页数:17
相关论文
共 37 条
[1]   trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses [J].
Capella-Gutierrez, Salvador ;
Silla-Martinez, Jose M. ;
Gabaldon, Toni .
BIOINFORMATICS, 2009, 25 (15) :1972-1973
[2]   Biopython']python: freely available Python']Python tools for computational molecular biology and bioinformatics [J].
Cock, Peter J. A. ;
Antao, Tiago ;
Chang, Jeffrey T. ;
Chapman, Brad A. ;
Cox, Cymon J. ;
Dalke, Andrew ;
Friedberg, Iddo ;
Hamelryck, Thomas ;
Kauff, Frank ;
Wilczynski, Bartek ;
de Hoon, Michiel J. L. .
BIOINFORMATICS, 2009, 25 (11) :1422-1423
[3]   BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments [J].
Criscuolo, Alexis ;
Gribaldo, Simonetta .
BMC EVOLUTIONARY BIOLOGY, 2010, 10
[4]   UFBoot2: Improving the Ultrafast Bootstrap Approximation [J].
Diep Thi Hoang ;
Chernomor, Olga ;
von Haeseler, Arndt ;
Minh, Bui Quang ;
Le Sy Vinh .
MOLECULAR BIOLOGY AND EVOLUTION, 2018, 35 (02) :518-522
[5]   Noisy:: Identification of problematic columns in multiple sequence alignments [J].
Dress, Andreas W. M. ;
Flamm, Christoph ;
Fritzsch, Guido ;
Gruenewald, Stefan ;
Kruspe, Matthias ;
Prohaska, Sonja J. ;
Stadler, Peter F. .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2008, 3 (1)
[6]   integRATE: a desirability-based data integration framework for the prioritization of candidate genes across heterogeneous omics and its application to preterm birth [J].
Eidem, Haley R. ;
Steenwyk, Jacob L. ;
Wisecaver, Jennifer H. ;
Capra, John A. ;
Abbot, Patrick ;
Rokas, Antonis .
BMC MEDICAL GENOMICS, 2018, 11
[7]   INDELible: A Flexible Simulator of Biological Sequence Evolution [J].
Fletcher, William ;
Yang, Ziheng .
MOLECULAR BIOLOGY AND EVOLUTION, 2009, 26 (08) :1879-1888
[8]   Whole-genome analyses resolve early branches in the tree of life of modern birds [J].
Jarvis, Erich D. ;
Mirarab, Siavash ;
Aberer, Andre J. ;
Li, Bo ;
Houde, Peter ;
Li, Cai ;
Ho, Simon Y. W. ;
Faircloth, Brant C. ;
Nabholz, Benoit ;
Howard, Jason T. ;
Suh, Alexander ;
Weber, Claudia C. ;
da Fonseca, Rute R. ;
Li, Jianwen ;
Zhang, Fang ;
Li, Hui ;
Zhou, Long ;
Narula, Nitish ;
Liu, Liang ;
Ganapathy, Ganesh ;
Boussau, Bastien ;
Bayzid, Md. Shamsuzzoha ;
Zavidovych, Volodymyr ;
Subramanian, Sankar ;
Gabaldon, Toni ;
Capella-Gutierrez, Salvador ;
Huerta-Cepas, Jaime ;
Rekepalli, Bhanu ;
Munch, Kasper ;
Schierup, Mikkel ;
Lindow, Bent ;
Warren, Wesley C. ;
Ray, David ;
Green, Richard E. ;
Bruford, Michael W. ;
Zhan, Xiangjiang ;
Dixon, Andrew ;
Li, Shengbin ;
Li, Ning ;
Huang, Yinhua ;
Derryberry, Elizabeth P. ;
Bertelsen, Mads Frost ;
Sheldon, Frederick H. ;
Brumfield, Robb T. ;
Mello, Claudio V. ;
Lovell, Peter V. ;
Wirthlin, Morgan ;
Cruz Schneider, Maria Paula ;
Prosdocimi, Francisco ;
Samaniego, Jose Alfredo .
SCIENCE, 2014, 346 (6215) :1320-1331
[9]   Phylogenetic tree building in the genomic age [J].
Kapli, Paschalia ;
Yang, Ziheng ;
Telford, Maximilian J. .
NATURE REVIEWS GENETICS, 2020, 21 (07) :428-444
[10]  
Kassambara A, 2020, Ggpubr