Statistics and Truth in Phylogenomics

被引:191
作者
Kumar, Sudhir [1 ,2 ]
Filipski, Alan J. [1 ]
Battistuzzi, Fabia U. [1 ]
Pond, Sergei L. Kosakovsky [3 ]
Tamura, Koichiro [4 ]
机构
[1] Arizona State Univ, Biodesign Inst, Ctr Evolutionary Med & Informat, Tempe, AZ 85287 USA
[2] Arizona State Univ, Sch Life Sci, Tempe, AZ 85287 USA
[3] Univ Calif San Diego, Dept Med, La Jolla, CA 92093 USA
[4] Tokyo Metropolitan Univ, Dept Biol Sci, Tokyo 158, Japan
基金
美国国家卫生研究院;
关键词
molecular evolution; statistical inference; phylogenetics; evolutionary tree; statistical bias; variance; DETECTING POSITIVE SELECTION; AMINO-ACID SITES; BAYESIAN POSTERIOR PROBABILITIES; NUCLEOTIDE SUBSTITUTION RATES; MULTIPLE SEQUENCE ALIGNMENT; HORIZONTAL GENE-TRANSFER; FALSE DISCOVERY RATE; EVOLUTIONARY RELATIONSHIPS; PHYLOGENETIC TREES; BRANCH-SITE;
D O I
10.1093/molbev/msr202
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Phylogenomics refers to the inference of historical relationships among species using genome-scale sequence data and to the use of phylogenetic analysis to infer protein function in multigene families. With rapidly decreasing sequencing costs, phylogenomics is becoming synonymous with evolutionary analysis of genome-scale and taxonomically densely sampled data sets. In phylogenetic inference applications, this translates into very large data sets that yield evolutionary and functional inferences with extremely small variances and high statistical confidence (P value). However, reports of highly significant P values are increasing even for contrasting phylogenetic hypotheses depending on the evolutionary model and inference method used, making it difficult to establish true relationships. We argue that the assessment of the robustness of results to biological factors, that may systematically mislead (bias) the outcomes of statistical estimation, will be a key to avoiding incorrect phylogenomic inferences. In fact, there is a need for increased emphasis on the magnitude of differences (effect sizes) in addition to the P values of the statistical test of the null hypothesis. On the other hand, the amount of sequence data available will likely always remain inadequate for some phylogenomic applications, for example, those involving episodic positive selection at individual codon positions and in specific lineages. Again, a focus on effect size and biological relevance, rather than the P value, may be warranted. Here, we present a theoretical overview and discuss practical aspects of the interplay between effect sizes, bias, and P values as it relates to the statistical inference of evolutionary truth in phylogenomics.
引用
收藏
页码:457 / 472
页数:16
相关论文
共 206 条
[31]   CodonTest: Modeling Amino Acid Substitution Preferences in Coding Sequences [J].
Delport, Wayne ;
Scheffler, Konrad ;
Botha, Gordon ;
Gravenor, Mike B. ;
Muse, Spencer V. ;
Pond, Sergei L. Kosakovsky .
PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (08)
[32]   Frequent Toggling between Alternative Amino Acids Is Driven by Selection in HIV-1 [J].
Delport, Wayne ;
Scheffler, Konrad ;
Seoighe, Cathal .
PLOS PATHOGENS, 2008, 4 (12)
[33]   Models of coding sequence evolution [J].
Delport, Wayne ;
Scheffler, Konrad ;
Seoighe, Cathal .
BRIEFINGS IN BIOINFORMATICS, 2009, 10 (01) :97-109
[34]   Phylogenomics and the reconstruction of the tree of life [J].
Delsuc, F ;
Brinkmann, H ;
Philippe, H .
NATURE REVIEWS GENETICS, 2005, 6 (05) :361-375
[35]   On the origin of prokaryotic species [J].
Doolittle, W. Ford ;
Zhaxybayeva, Olga .
GENOME RESEARCH, 2009, 19 (05) :744-756
[36]   Phylogenetic classification and the universal tree [J].
Doolittle, WF .
SCIENCE, 1999, 284 (5423) :2124-2128
[37]   A combined empirical and mechanistic codon model [J].
Doron-Faigenboim, Adi ;
Pupko, Tal .
MOLECULAR BIOLOGY AND EVOLUTION, 2007, 24 (02) :388-397
[38]   Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability [J].
Douady, CJ ;
Delsuc, F ;
Boucher, Y ;
Doolittle, WF ;
Douzery, EJP .
MOLECULAR BIOLOGY AND EVOLUTION, 2003, 20 (02) :248-254
[39]   MUSCLE: a multiple sequence alignment method with reduced time and space complexity [J].
Edgar, RC .
BMC BIOINFORMATICS, 2004, 5 (1) :1-19
[40]   Bootstrap confidence levels for phylogenetic trees (vol 93, pg 7085, 1996) [J].
Efron, B ;
Halloran, E ;
Holmes, S .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (23) :13429-13434