Properties of Consensus Methods for Inferring Species Trees from Gene Trees

被引:105
作者
Degnan, James H. [1 ]
DeGiorgio, Michael [2 ]
Bryant, David [3 ]
Rosenberg, Noah A. [1 ,2 ]
机构
[1] Univ Michigan, Dept Human Genet, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Ctr Computat Med & Biol, Ann Arbor, MI 48109 USA
[3] Univ Auckland, Dept Math, Auckland, New Zealand
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Anomalous gene tree; coalescence; discordance; lineage sorting; phylogenetics; statistical consistency; PHYLOGENETIC INFERENCE; CONCORDANCE; DISTRIBUTIONS; PROBABILITY; SEQUENCES; MIXTURES;
D O I
10.1093/sysbio/syp008
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Consensus methods provide a useful strategy for summarizing information from a collection of gene trees. An important application of consensus methods is to combine gene trees to estimate a species tree. To investigate the theoretical properties of consensus trees that would be obtained from large numbers of loci evolving according to a basic evolutionary model, we construct consensus trees from rooted gene trees that occur in proportion to gene-tree probabilities derived from coalescent theory. We consider majority-rule, rooted triple (R-*), and greedy consensus trees obtained from known, rooted gene trees, both in the asymptotic case as numbers of gene trees approach infinity and for finite numbers of genes. Our results show that for some combinations of species-tree branch lengths, increasing the number of independent loci can make the rooted majority-rule consensus tree more likely to be at least partially unresolved. However, the probability that the R-* consensus tree has the species-tree topology approaches 1 as the number of gene trees approaches infinity. Although the greedy consensus algorithm can be the quickest to converge on the correct species-tree topology when increasing the number of gene trees, it can also be positively misleading. The majority-rule consensus tree is not a misleading estimator of the species-tree topology, and the R-* consensus tree is a statistically consistent estimator of the species-tree topology. Our results therefore suggest a method for using multiple loci to infer the species-tree topology, even when it is discordant with the most likely gene tree.
引用
收藏
页码:35 / 54
页数:20
相关论文
共 46 条
[1]  
Ané C, 2007, MOL BIOL EVOL, V24, P412
[2]  
[Anonymous], 1987, Science, Philosophy, and Human Behavior in the Soviet Union
[3]  
[Anonymous], 2005, Gene genealogies, variation and evolution
[5]   Concordance trees, concordance factors, and the exploration of reticulate genealogy [J].
Baum, David A. .
TAXON, 2007, 56 (02) :417-426
[6]   COMBINABLE COMPONENT CONSENSUS [J].
BREMER, K .
CLADISTICS-THE INTERNATIONAL JOURNAL OF THE WILLI HENNIG SOCIETY, 1990, 6 (04) :369-372
[7]   A structured family of clustering and tree construction methods [J].
Bryant, D .
ADVANCES IN APPLIED MATHEMATICS, 2001, 27 (04) :705-732
[8]  
Bryant D., 2003, BioConsensus, P163
[9]   Integrating phylogenetic and population genetic analyses of multiple loci to test species divergence hypotheses in Passerina buntings [J].
Carling, Matt D. ;
Brumfield, Robb T. .
GENETICS, 2008, 178 (01) :363-377
[10]   Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting:: An example from Melanoplus grasshoppers [J].
Carstens, Bryan C. ;
Knowles, L. Lacey .
SYSTEMATIC BIOLOGY, 2007, 56 (03) :400-411