Improving the estimation of genetic distances from Next-Generation Sequencing data

被引:91
作者
Vieira, Filipe G. [1 ,2 ]
Lassalle, Florent [3 ]
Korneliussen, Thorfinn S. [1 ,2 ]
Fumagalli, Matteo [3 ]
机构
[1] Univ Copenhagen, Ctr GeoGenet, DK-2100 Copenhagen, Denmark
[2] Univ Copenhagen, Nat Hist Museum Denmark, Evogenom Sect, DK-2100 Copenhagen, Denmark
[3] UCL, UCL Genet Inst, Dept Genet Evolut & Environm, London WC1E 6BT, England
关键词
Bayesian inference; maximum likelihood; phylogenetics; population structure; PHYLOGENY RECONSTRUCTION; POPULATION GENOMICS; ALLELE FREQUENCY; RECOMBINATION; ASSOCIATION; POLYMORPHISM; ADAPTATION; EVOLUTION; INFERENCE; MAP;
D O I
10.1111/bij.12511
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Next-Generation Sequencing (NGS) technologies have revolutionized research in evolutionary biology, by increasing the sequencing speed and reducing the experimental costs. However, sequencing errors are higher than in traditional technologies and, furthermore, many studies rely on low-depth sequencing. Under these circumstances, the use of standard methods for inferring genotypes leads to biased estimates of nucleotide variation, which can bias all downstream analyses. Through simulations, we assessed the bias in estimating genetic distances under several different scenarios. The results indicate that naive methods for assigning individual genotypes greatly overestimate genetic distances. We propose a novel method to estimate genetic distances that is suitable for low-depth NGS data and takes genotype call statistical uncertainty into account. We applied this method to investigate the genetic structure of domesticated and wild strains of rice. We implemented this approach in an open-source software and discuss further directions of phylogenetic analyses within this novel probabilistic framework. (C) 2015 The Linnean Society of London,
引用
收藏
页码:139 / 149
页数:11
相关论文
共 32 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]   A Fine-Scale Chimpanzee Genetic Map from Population Sequencing [J].
Auton, Adam ;
Fledel-Alon, Adi ;
Pfeifer, Susanne ;
Venn, Oliver ;
Segurel, Laure ;
Street, Teresa ;
Leffler, Ellen M. ;
Bowden, Rory ;
Aneas, Ivy ;
Broxholme, John ;
Humburg, Peter ;
Iqbal, Zamin ;
Lunter, Gerton ;
Maller, Julian ;
Hernandez, Ryan D. ;
Melton, Cord ;
Venkat, Aarti ;
Nobrega, Marcelo A. ;
Bontrop, Ronald ;
Myers, Simon ;
Donnelly, Peter ;
Przeworski, Molly ;
McVean, Gil .
SCIENCE, 2012, 336 (6078) :193-198
[3]   Unlocking the vault: next-generation museum population genomics [J].
Bi, Ke ;
Linderoth, Tyler ;
Vanderpool, Dan ;
Good, Jeffrey M. ;
Nielsen, Rasmus ;
Moritz, Craig .
MOLECULAR ECOLOGY, 2013, 22 (24) :6018-6032
[4]   Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering [J].
Browning, Sharon R. ;
Browning, Brian L. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) :1084-1097
[5]   Polymorphism in lake trout in Great Bear Lake: intra-lake morphological diversification at two spatial scales [J].
Chavarie, Louise ;
Howland, Kimberly ;
Harris, Les ;
Tonn, William .
BIOLOGICAL JOURNAL OF THE LINNEAN SOCIETY, 2015, 114 (01) :109-125
[6]   Linking Great Apes Genome Evolution across Time Scales Using Polymorphism-Aware Phylogenetic Models [J].
De Maio, Nicola ;
Schloetterer, Christian ;
Kosiol, Carolin .
MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (10) :2249-2262
[7]   Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle [J].
Desper, R ;
Gascuel, O .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (05) :687-705
[8]   MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus [J].
Ewing, Gregory ;
Hermisson, Joachim .
BIOINFORMATICS, 2010, 26 (16) :2064-2065
[9]   ngsTools: methods for population genetics analyses from next-generation sequencing data [J].
Fumagalli, Matteo ;
Vieira, Filipe G. ;
Linderoth, Tyler ;
Nielsen, Rasmus .
BIOINFORMATICS, 2014, 30 (10) :1486-1487
[10]   Reference-Free Population Genomics from Next-Generation Transcriptome Data and the Vertebrate-Invertebrate Gap [J].
Gayral, Philippe ;
Melo-Ferreira, Jose ;
Glemin, Sylvain ;
Bierne, Nicolas ;
Carneiro, Miguel ;
Nabholz, Benoit ;
Lourenco, Joao M. ;
Alves, Paulo C. ;
Ballenghien, Marion ;
Faivre, Nicolas ;
Belkhir, Khalid ;
Cahais, Vincent ;
Loire, Etienne ;
Bernard, Aurelien ;
Galtier, Nicolas .
PLOS GENETICS, 2013, 9 (04)