When being "most likely" is not enough: Examining the performance of three uses of the parametric bootstrap in phylogenetics

被引:13
作者
Antezana, M [1 ]
机构
[1] Univ Chicago, Dept Ecol & Evolut, Chicago, IL 60637 USA
关键词
parametric bootstrap; resampling bootstrap; topology test; phylogenetics; star tree; dichotomous tree; four-taxon case; site-pattern; conservativeness; critical length; confidence interval; null hypothesis; hypothesis testing; discreteness; sparseness; homoplasy; p-value; type I error; power;
D O I
10.1007/s00239-002-2394-1
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
I show that three parametric-bootstrap (PB) applications that have been proposed for phylogenetic analysis, can be misleading as currently implemented. First, I show that simulating a topology estimated from preliminary data in order to determine the sequence length that should allow the best tree obtained from more extensive data to be correct with a desired probability, delivers an accurate estimate of this length only in topological situations in which most preliminary trees are expected to be both correct and statistically significant, i.e. when no further analysis would be needed. Otherwise, one obtains strong underestimates of the length or similarly biased values for incorrect trees. Second, I show that PB-based topology tests that use as null hypothesis the most likely tree congruent with a pre-specified topological relationship alternative to the unconstrained most likely tree, and simulate this tree for P value estimation, produce excessive type I error (from 50% to 600% and higher) when they are applied to null data generated by star-shaped or dichotomous four-taxon topologies. Simulating the most likely star topology for P value estimation results instead in correct type-I-error production even when the null data are generated by a dichotomous topology. This is a strong indication that the star topology is the correct default null hypothesis for phylogenies. Third, I show that PB-estimated confidence intervals (CIs) for the length of a tree branch are generally accurate, although in some situations they can be strongly over- or under-estimated relative to the "true" CI. Attempts to identify a biased CI through a further round of simulations were unsuccessful. Tracing the origin and propagation of parameter estimate error through the CI estimation exercise, showed that the sparseness of site-patterns which are crucial to the estimation of pivotal parameters, can allow homoplasy to bias these estimates and ultimately the PB-based CI estimation. Concluding, I stress that statistical techniques that simulate models estimated from limited data need to be carefully calibrated, and I defend the point that pattern-sparseness assessment will be the next frontier in the statistical analysis of phylogenies, an effort that will require taking advantage of the merits of black-box maximum-likelihood approaches and of insights from intuitive, site-pattern-oriented approaches like parsimony.
引用
收藏
页码:198 / 222
页数:25
相关论文
共 23 条
[1]   Type I error and the power of the s-test:: Old lessons from a new, analytically justified statistical test for phylogenies [J].
Antezana, MA ;
Hudson, RR .
SYSTEMATIC BIOLOGY, 1999, 48 (02) :300-316
[2]  
DOPAZO J, 1994, J MOL EVOL, V38, P300, DOI 10.1007/BF00176092
[3]   BOOTSTRAP CONFIDENCE-INTERVALS FOR A CLASS OF PARAMETRIC PROBLEMS [J].
EFRON, B .
BIOMETRIKA, 1985, 72 (01) :45-58
[4]   PHYLOGENIES FROM MOLECULAR SEQUENCES - INFERENCE AND RELIABILITY [J].
FELSENSTEIN, J .
ANNUAL REVIEW OF GENETICS, 1988, 22 :521-565
[5]   CASES IN WHICH PARSIMONY OR COMPATIBILITY METHODS WILL BE POSITIVELY MISLEADING [J].
FELSENSTEIN, J .
SYSTEMATIC ZOOLOGY, 1978, 27 (04) :401-410
[6]   IS THERE SOMETHING WRONG WITH THE BOOTSTRAP ON PHYLOGENIES - A REPLY [J].
FELSENSTEIN, J ;
KISHINO, H .
SYSTEMATIC BIOLOGY, 1993, 42 (02) :193-200
[7]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[8]  
Felsenstein J., 1993, PHYLIP PHYLOGENY INF
[9]   A nonhyperthermophilic common ancestor to extant life forms [J].
Galtier, N ;
Tourasse, N ;
Gouy, M .
SCIENCE, 1999, 283 (5399) :220-221
[10]   STATISTICAL TESTS OF MODELS OF DNA SUBSTITUTION [J].
GOLDMAN, N .
JOURNAL OF MOLECULAR EVOLUTION, 1993, 36 (02) :182-198