Short Tree, Long Tree, Right Tree, Wrong Tree: New Acquisition Bias Corrections for Inferring SNP Phylogenies

被引:251
作者
Leache, Adam D. [1 ,2 ]
Banbury, Barbara L. [1 ]
Felsenstein, Joseph [1 ,3 ]
Nieto-Montes de Oca, Adrian [4 ]
Stamatakis, Alexandros [5 ,6 ]
机构
[1] Univ Washington, Dept Biol, Seattle, WA 98195 USA
[2] Univ Washington, Burke Museum Nat Hist & Culture, Seattle, WA 98195 USA
[3] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[4] Univ Nacl Autonoma Mexico, Fac Ciencias, Dept Biol Evolut, Mexico City 04510, DF, Mexico
[5] HITS gGmbH, Exelixis Lab, Sci Comp Grp, D-69118 Heidelberg, Germany
[6] Karlsruhe Inst Technol, Inst Theoret Informat, Dept Informat, D-76131 Karlsruhe, Germany
基金
美国国家科学基金会;
关键词
Conditional likelihood; ddRADseq; maximum likelihood; Phrynosoma; Phrynosomatidae; reconstituted DNA; SVDquartets; MAXIMUM-LIKELIHOOD; DISCOVERY; RESOLUTION; DIVERSITY; INFERENCE;
D O I
10.1093/sysbio/syv053
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Single nucleotide polymorphisms (SNPs) are useful markers for phylogenetic studies owing in part to their ubiquity throughout the genome and ease of collection. Restriction site associated DNA sequencing (RADseq) methods are becoming increasingly popular for SNP data collection, but an assessment of the best practises for using these data in phylogenetics is lacking. We use computer simulations, and new double digest RADseq (ddRADseq) data for the lizard family Phrynosomatidae, to investigate the accuracy of RAD loci for phylogenetic inference. We compare the two primary ways RAD loci are used during phylogenetic analysis, including the analysis of full sequences (i.e., SNPs together with invariant sites), or the analysis of SNPs on their own after excluding invariant sites. We find that using full sequences rather than just SNPs is preferable from the perspectives of branch length and topological accuracy, but not of computational time. We introduce two new acquisition bias corrections for dealing with alignments composed exclusively of SNPs, a conditional likelihood method and a reconstituted DNA approach. The conditional likelihood method conditions on the presence of variable characters only (the number of invariant sites that are unsampled but known to exist is not considered), while the reconstituted DNA approach requires the user to specify the exact number of unsampled invariant sites prior to the analysis. Under simulation, branch length biases increase with the amount of missing data for both acquisition bias correction methods, but branch length accuracy is much improved in the reconstituted DNA approach compared to the conditional likelihood approach. Phylogenetic analyses of the empirical data using concatenation or a coalescent-based species tree approach provide strong support for many of the accepted relationships among phrynosomatid lizards, suggesting that RAD loci contain useful phylogenetic signal across a range of divergence times despite the presence of missing data. Phylogenetic analysis of RAD loci requires careful attention to model assumptions, especially if downstream analyses depend on branch lengths.
引用
收藏
页码:1032 / 1047
页数:16
相关论文
共 55 条
[1]   RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling [J].
Arnold, B. ;
Corbett-Detig, R. B. ;
Hartl, D. ;
Bomblies, K. .
MOLECULAR ECOLOGY, 2013, 22 (11) :3179-3190
[2]   Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers [J].
Baird, Nathan A. ;
Etter, Paul D. ;
Atwood, Tressa S. ;
Currey, Mark C. ;
Shiver, Anthony L. ;
Lewis, Zachary A. ;
Selker, Eric U. ;
Cresko, William A. ;
Johnson, Eric A. .
PLOS ONE, 2008, 3 (10)
[3]   Automated Reconstruction of Whole-Genome Phylogenies from Short-Sequence Reads [J].
Bertels, Frederic ;
Silander, Olin K. ;
Pachkov, Mikhail ;
Rainey, Paul B. ;
van Nimwegen, Erik .
MOLECULAR BIOLOGY AND EVOLUTION, 2014, 31 (05) :1077-1088
[4]   Homoplasy and Clade Support [J].
Brandley, Matthew C. ;
Warren, Dan L. ;
Leache, Adam D. ;
McGuire, Jimmy A. .
SYSTEMATIC BIOLOGY, 2009, 58 (02) :184-198
[5]   The utility of single nucleotide polymorphisms in inferences of population history [J].
Brumfield, RT ;
Beerli, P ;
Nickerson, DA ;
Edwards, SV .
TRENDS IN ECOLOGY & EVOLUTION, 2003, 18 (05) :249-256
[6]   THE IMPACT OF GENE-TREE/SPECIES-TREE DISCORDANCE ON DIVERSIFICATION-RATE ESTIMATION [J].
Burbrink, Frank T. ;
Pyron, R. Alexander .
EVOLUTION, 2011, 65 (07) :1851-1861
[7]   Is RAD-seq suitable for phylogenetic inference? An in silico assessment and optimization [J].
Cariou, Marie ;
Duret, Laurent ;
Charlat, Sylvain .
ECOLOGY AND EVOLUTION, 2013, 3 (04) :846-852
[8]   Quartet Inference from SNP Data Under the Coalescent Model [J].
Chifman, Julia ;
Kubatko, Laura .
BIOINFORMATICS, 2014, 30 (23) :3317-3324
[9]   Empirical Assessment of RAD Sequencing for Interspecific Phylogeny [J].
Cruaud, Astrid ;
Gautier, Mathieu ;
Galan, Maxime ;
Foucaud, Julien ;
Saune, Laure ;
Genson, Gwenaelle ;
Dubois, Emeric ;
Nidelet, Sabine ;
Deuve, Thierry ;
Rasplus, Jean-Yves .
MOLECULAR BIOLOGY AND EVOLUTION, 2014, 31 (05) :1272-1274
[10]   jModelTest 2: more models, new heuristics and parallel computing [J].
Darriba, Diego ;
Taboada, Guillermo L. ;
Doallo, Ramon ;
Posada, David .
NATURE METHODS, 2012, 9 (08) :772-772