Detecting Phylogenetic Breakpoints and Discordance from Genome-Wide Alignments for Species Tree Reconstruction

被引:28
作者
Ane, Cecile [1 ,2 ]
机构
[1] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
[2] Univ Wisconsin, Dept Bot, Madison, WI 53706 USA
基金
美国国家科学基金会;
关键词
phylogenomics; minimum description length; Bayesian concordance analysis; recombination; horizontal transfer; incomplete lineage sorting; DNA-SEQUENCE ALIGNMENTS; MAXIMUM-LIKELIHOOD; MIXTURE MODEL; RECOMBINATION; GENE; CONCORDANCE; HETEROGENEITY; HETEROTACHY; PARSIMONY; INFERENCE;
D O I
10.1093/gbe/evr013
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
With the easy acquisition of sequence data, it is now possible to obtain and align whole genomes across multiple related species or populations. In this work, I assess the performance of a statistical method to reconstruct the whole distribution of phylogenetic trees along the genome, estimate the proportion of the genome for which a given clade is true, and infer a concordance tree that summarizes the dominant vertical inheritance pattern. There are two main issues when dealing with whole-genome alignments, as opposed to multiple genes: the size of the data and the detection of recombination breakpoints. These breakpoints partition the genomic alignment into phylogenetically homogeneous loci, where sites within a given locus all share the same phylogenetic tree topology. To delimitate these loci, I describe here a method based on the minimum description length (MDL) principle, implemented with dynamic programming for computational efficiency. Simulations show that combining MDL partitioning with Bayesian concordance analysis provides an efficient and robust way to estimate both the vertical inheritance signal and the horizontal phylogenetic signal. The method performed well both in the presence of incomplete lineage sorting and in the presence of horizontal gene transfer. A high level of systematic bias was found here, highlighting the need for good individual tree building methods, which form the basis for more elaborate gene tree/species tree reconciliation methods.
引用
收藏
页码:246 / 258
页数:13
相关论文
共 62 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   Serial SimCoal: A population genetics model for data from multiple populations and points in time [J].
Anderson, CNK ;
Ramakrishnan, U ;
Chan, YL ;
Hadly, EA .
BIOINFORMATICS, 2005, 21 (08) :1733-1734
[3]   Missing the forest for the trees:: Phylogenetic compression and its implications for inferring complex evolutionary histories [J].
Ané, C ;
Sanderson, MJ .
SYSTEMATIC BIOLOGY, 2005, 54 (01) :146-157
[4]  
Ane C., 2010, Estimating Species Trees: Practical and Theoretical Aspects, P35
[5]  
Ané C, 2007, MOL BIOL EVOL, V24, P412
[6]  
[Anonymous], 2002, PAUP*. Phylogenetic Analysis Using Parsimony (*and other methods). Version 4
[7]  
[Anonymous], 2010, ESTIMATING SPECIES T
[8]  
[Anonymous], 2005, Gene genealogies, variation and evolution
[9]   Concordance trees, concordance factors, and the exploration of reticulate genealogy [J].
Baum, David A. .
TAXON, 2007, 56 (02) :417-426
[10]   StepBrothers: inferring partially shared ancestries among recombinant viral sequences [J].
Bloomquist, Erik W. ;
Dorman, Karin S. ;
Suchard, Marc A. .
BIOSTATISTICS, 2009, 10 (01) :106-120