Misleading results of likelihood-based phylogenetic analyses in the presence of missing data

被引:83
作者
Simmons, Mark P. [1 ]
机构
[1] Colorado State Univ, Dept Biol, Ft Collins, CO 80523 USA
关键词
MAXIMUM-LIKELIHOOD; GENE TREES; MOLECULAR SYSTEMATICS; RELATIVE PERFORMANCE; SUPERMATRIX ANALYSIS; SPECIES TREES; MIXED MODELS; DATA SETS; PARSIMONY; INFERENCE;
D O I
10.1111/j.1096-0031.2011.00375.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The amount of missing data in many contemporary phylogenetic analyses has substantially increased relative to previous norms, particularly in supermatrix studies that compile characters from multiple previous analyses. In such cases the missing data are non-randomly distributed and usually present in all partitions (i.e. groups of characters) sampled. Parametric methods often provide greater resolution and support than parsimony in such cases, yet this may be caused by extrapolation of branch lengths from one partition to another. In this study I use contrived and simulated examples to demonstrate that likelihood, even when applied to simple matrices with little or no homoplasy, homogeneous evolution across groups of characters, perfect model fit, and hundreds or thousands of variable characters, can provide strong support for incorrect topologies when the matrices have non-random distributions of missing data distributed across all partitions. I do so using a systematic exploration of alternative seven-taxon tree topologies and distributions of missing data in two partitions to demonstrate that these likelihood-based artefacts may occur frequently and are not shared by parsimony. I also demonstrate that Bayesian Markov chain Monte Carlo analysis is more robust to these artefacts than is likelihood.
引用
收藏
页码:208 / 222
页数:15
相关论文
共 70 条
  • [1] NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION
    AKAIKE, H
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) : 716 - 723
  • [2] [Anonymous], 2001, PAUP PHYLOGENETIC AN
  • [3] [Anonymous], 1998, Molecular systematics of plants II: DNA sequencing, DOI DOI 10.1007/978-1-4615-5419-6_10
  • [4] [Anonymous], 2008, RAXML 7 0 4 MANUAL
  • [5] [Anonymous], SYSTEMATIC ZOOLOGY
  • [6] [Anonymous], 2021, Bayesian data analysis
  • [7] [Anonymous], 2006, GENETIC ALGORITHM AP
  • [8] Bayesian model adequacy and choice in phylogenetics
    Bollback, JP
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (07) : 1171 - 1180
  • [9] PARTITIONING AND COMBINING DATA IN PHYLOGENETIC ANALYSIS
    BULL, JJ
    HUELSENBECK, JP
    CUNNINGHAM, CW
    SWOFFORD, DL
    WADDELL, PJ
    [J]. SYSTEMATIC BIOLOGY, 1993, 42 (03) : 384 - 397
  • [10] Davis Jerrold I., 2005, P119