Missing data and the accuracy of Bayesian phylogenetics

被引:156
作者
Wiens, John J. [1 ]
Moen, Daniel S. [1 ]
机构
[1] SUNY Stony Brook, Dept Ecol & Evolut, Stony Brook, NY 11794 USA
关键词
accuracy; Bayesian analysis; missing data; phylogenetic analysis;
D O I
10.3724/SP.J.1002.2008.08040
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
The effect of missing data on phylogenetic methods is a potentially important issue in our attempts to reconstruct the Tree of Life. If missing data are truly problematic, then it may be unwise to include species in an analysis that lack data for some characters (incomplete taxa) or to include characters that lack data for some species. Given the difficulty of obtaining data from all characters for all taxa (e.g., fossils), missing data might seriously impede efforts to reconstruct a comprehensive phylogeny that includes all species. Fortunately, recent simulations and empirical analyses suggest that missing data cells are not themselves problematic, and that incomplete taxa can be accurately placed as long as the overall number of characters in the analysis is large. However, these studies have so far only been conducted on parsimony, likelihood, and neighbor joining methods. Although Bayesian phylogenetic methods have become widely used in recent years, the effects of missing data on Bayesian analysis have not been adequately studied. Here, we conduct simulations to test whether Bayesian analyses can accurately place incomplete taxa despite extensive missing data. In agreement with previous studies of other methods, we find that Bayesian analyses can accurately reconstruct the position of highly incomplete taxa (i.e., 95% missing data), as long as the overall number of characters in the analysis is large. These results suggest that highly incomplete taxa can be safely included in many Bayesian phylogenetic analyses.
引用
收藏
页码:307 / 314
页数:8
相关论文
共 33 条
[1]   Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence [J].
Alfaro, ME ;
Zoller, S ;
Lutzoni, F .
MOLECULAR BIOLOGY AND EVOLUTION, 2003, 20 (02) :255-266
[2]   Comparing bootstrap and posterior probability values in the four-taxon case [J].
Cummings, MP ;
Handley, SA ;
Myers, DS ;
Reed, DL ;
Rokas, A ;
Winka, K .
SYSTEMATIC BIOLOGY, 2003, 52 (04) :477-487
[3]   Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability [J].
Douady, CJ ;
Delsuc, F ;
Boucher, Y ;
Doolittle, WF ;
Douzery, EJP .
MOLECULAR BIOLOGY AND EVOLUTION, 2003, 20 (02) :248-254
[4]   Prospects for building the tree of life from large sequence databases [J].
Driskell, AC ;
Ané, C ;
Burleigh, JG ;
McMahon, MM ;
O'Meara, BC ;
Sanderson, MJ .
SCIENCE, 2004, 306 (5699) :1172-1174
[5]   Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics [J].
Erixon, P ;
Svennblad, B ;
Britton, T ;
Oxelman, B .
SYSTEMATIC BIOLOGY, 2003, 52 (05) :665-673
[6]   CASES IN WHICH PARSIMONY OR COMPATIBILITY METHODS WILL BE POSITIVELY MISLEADING [J].
FELSENSTEIN, J .
SYSTEMATIC ZOOLOGY, 1978, 27 (04) :401-410
[7]   A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood [J].
Guindon, S ;
Gascuel, O .
SYSTEMATIC BIOLOGY, 2003, 52 (05) :696-704
[8]   Using ESTs for phylogenomics: Can one accurately infer a phylogenetic tree from a gappy alignment? [J].
Hartmann, Stefanie ;
Vision, Todd J. .
BMC EVOLUTIONARY BIOLOGY, 2008, 8 (1)
[9]   DATING OF THE HUMAN APE SPLITTING BY A MOLECULAR CLOCK OF MITOCHONDRIAL-DNA [J].
HASEGAWA, M ;
KISHINO, H ;
YANO, TA .
JOURNAL OF MOLECULAR EVOLUTION, 1985, 22 (02) :160-174
[10]   APPROACHES FOR ASSESSING PHYLOGENETIC ACCURACY [J].
HILLIS, DM .
SYSTEMATIC BIOLOGY, 1995, 44 (01) :3-16