Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, site-specific rate variation, and invariable sites

被引:198
作者
Chifman, Julia [1 ]
Kubatko, Laura [2 ,3 ]
机构
[1] Wake Forest Sch Med, Dept Canc Biol, Winston Salem, NC 27157 USA
[2] Ohio State Univ, Dept Stat, Columbus, OH 43210 USA
[3] Ohio State Univ, Dept Evolut Ecol & Organismal Biol, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
Phylogenetics; Identifiability; Algebraic statistics; MAXIMUM-LIKELIHOOD; GENE TREES; DNA-SEQUENCES; EVOLUTIONARY TREES; LINEAR INVARIANTS; PHYLOGENY; PROBABILITIES; INFERENCE; COVARION;
D O I
10.1016/j.jtbi.2015.03.006
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The inference of the evolutionary history of a collection of organisms is a problem of fundamental importance in evolutionary biology. The abundance of DNA sequence data arising from genome sequencing projects has led to significant challenges in the inference of these phylogenetic relationships. Among these challenges is the inference of the evolutionary history of a collection of species based on sequence information from several distinct genes sampled throughout the genome. It is widely accepted that each individual gene has its own phylogeny, which may not agree with the species tree. Many possible causes of this gene tree incongruence are known. The best studied is the incomplete lineage sorting, which is commonly modeled by the coalescent process. Numerous methods based on the coalescent process have been proposed for the estimation of the phylogenetic species tree given DNA sequence data. However, use of these methods assumes that the phylogenetic species tree can be identified from DNA sequence data at the leaves of the tree, although this has not been formally established. We prove that the unrooted topology of the n-leaf phylogenetic species tree is generically identifiable given observed data at the leaves of the tree that are assumed to have arisen from the coalescent process under a time-reversible substitution process with the possibility of site-specific rate variation modeled by the discrete gamma distribution and a proportion of invariable sites. Published by Elsevier Ltd.
引用
收藏
页码:35 / 47
页数:13
相关论文
共 38 条
[1]   Identifiability of a Markovian model of molecular evolution with gamma-distributed rates [J].
Allman, Elizabeth S. ;
Ane, Cecile ;
Rhodes, John A. .
ADVANCES IN APPLIED PROBABILITY, 2008, 40 (01) :229-249
[2]   Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites [J].
Allman, Elizabeth S. ;
Rhodes, John A. .
MATHEMATICAL BIOSCIENCES, 2008, 211 (01) :18-33
[3]   The identifiability of tree topology for phylogenetic models, including covarion and mixture models [J].
Allman, Elizabeth S. ;
Rhodes, John A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (05) :1101-1113
[4]   Determining species tree topologies from clade probabilities under the coalescent [J].
Allman, Elizabeth S. ;
Degnan, James H. ;
Rhodes, John A. .
JOURNAL OF THEORETICAL BIOLOGY, 2011, 289 :96-106
[5]   Identifiability of Two-Tree Mixtures for Group-Based Models [J].
Allman, Elizabeth S. ;
Petrovic, Sonja ;
Rhodes, John A. ;
Sullivant, Seth .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (03) :710-722
[6]   Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent [J].
Allman, Elizabeth S. ;
Degnan, James H. ;
Rhodes, John A. .
JOURNAL OF MATHEMATICAL BIOLOGY, 2011, 62 (06) :833-862
[7]   The Identifiability of Covarion Models in Phylogenetics [J].
Allman, Elizabeth S. ;
Rhodes, John A. .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2009, 6 (01) :76-88
[8]  
[Anonymous], 1982, Stochastic Processes and Their Applications, DOI 10.1016/0304-4149(82)90011-4
[9]   Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis [J].
Bryant, David ;
Bouckaert, Remco ;
Felsenstein, Joseph ;
Rosenberg, Noah A. ;
RoyChoudhury, Arindam .
MOLECULAR BIOLOGY AND EVOLUTION, 2012, 29 (08) :1917-1932
[10]  
CAVENDER JA, 1989, MOL BIOL EVOL, V6, P301