Origin matters: Using a local reference genome improves measures in population genomics

被引:20
作者
Thorburn, Doko-Miles J. [1 ,2 ]
Sagonas, Kostas [1 ,3 ]
Binzer-Panchal, Mahesh [4 ]
Chain, Frederic J. J. [5 ]
Feulner, Philine G. D. [6 ,7 ]
Bornberg-Bauer, Erich [8 ]
Reusch, Thorsten B. H. [9 ]
Samonte-Padilla, Irene E. [10 ]
Milinski, Manfred [10 ]
Lenz, Tobias L. [11 ,12 ]
Eizaguirre, Christophe [1 ]
机构
[1] Queen Mary Univ London, Sch Biol & Chem Sci, London, England
[2] Imperial Coll London, Dept Life Sci, London, England
[3] Aristotle Univ Thessaloniki, Sch Biol, Dept Zool, Thessaloniki, Greece
[4] Uppsala Univ, Dept Med Biochem & Microbiol, Sci Life Lab, Natl Bioinformat Infrastructure Sweden NBIS, Uppsala, Sweden
[5] Univ Massachusetts Lowell, Dept Biol Sci, Lowell, MA USA
[6] EAWAG Swiss Fed Inst Aquat Sci & Technol, Ctr Ecol Evolut & Biogeochem, Dept Fish Ecol & Evolut, Kastanienbaum, Switzerland
[7] Univ Bern, Inst Ecol & Evolut, Div Aquat Ecol & Evolut, Bern, Switzerland
[8] Univ Munster, Inst Evolut & Biodivers, Evolutionary Bioinformat, Munster, Germany
[9] GEOMAR Helmholtz Ctr Ocean Res, Marine Evolutionary Ecol, Kiel, Germany
[10] Max Planck Inst Evolutionary Biol, Dept Evolutionary Ecol, Plon, Germany
[11] Max Planck Inst Evolutionary Biol, Res Grp Evolutionary Immunogen, Plon, Germany
[12] Univ Hamburg, Dept Biol, Res Unit Evolutionary Immunogen, Hamburg, Germany
关键词
Gasterosteus aculeatus; genome assembly; gynogenetic; population genomics; read mapping; reference genomes; reference mapping bias; stickleback; STANDING GENETIC-VARIATION; THREESPINE STICKLEBACK; EVOLUTIONARY; SEQUENCE; TOOLKIT; ANNOTATION; ADAPTATION; DISCOVERY; FRAMEWORK; ALIGNMENT;
D O I
10.1111/1755-0998.13838
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Genome sequencing enables answering fundamental questions about the genetic basis of adaptation, population structure and epigenetic mechanisms. Yet, we usually need a suitable reference genome for mapping population-level resequencing data. In some model systems, multiple reference genomes are available, giving the challenging task of determining which reference genome best suits the data. Here, we compared the use of two different reference genomes for the three-spined stickleback (Gasterosteus aculeatus), one novel genome derived from a European gynogenetic individual and the published reference genome of a North American individual. Specifically, we investigated the impact of using a local reference versus one generated from a distinct lineage on several common population genomics analyses. Through mapping genome resequencing data of 60 sticklebacks from across Europe and North America, we demonstrate that genetic distance among samples and the reference genomes impacts downstream analyses. Using a local reference genome increased mapping efficiency and genotyping accuracy, effectively retaining more and better data. Despite comparable distributions of the metrics generated across the genome using SNP data (i.e. p, Tajima's D and F-ST), window-based statistics using different references resulted in different outlier genes and enriched gene functions. A marker-based analysis of DNA methylation distributions had a comparably high overlap in outlier genes and functions, yet with distinct differences depending on the reference genome. Overall, our results highlight how using a local reference genome decreases reference bias to increase confidence in downstream analyses of the data. Such results have significant implications in all reference-genome-based population genomic analyses.
引用
收藏
页码:1706 / 1723
页数:18
相关论文
共 109 条
  • [1] genomation: a toolkit to summarize, annotate and visualize genomic intervals
    Akalin, Altuna
    Franke, Vedran
    Vlahovicek, Kristian
    Mason, Christopher E.
    Schuebeler, Dirk
    [J]. BIOINFORMATICS, 2015, 31 (07) : 1127 - 1129
  • [2] SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population
    Ameur, Adam
    Dahlberg, Johan
    Olason, Pall
    Vezzi, Francesco
    Karlsson, Robert
    Martin, Marcel
    Viklund, Johan
    Kahari, Andreas Kusalananda
    Lundin, Par
    Che, Huiwen
    Thutkawkorapin, Jessada
    Eisfeldt, Jesper
    Lampa, Samuel
    Dahlberg, Mats
    Hagberg, Jonas
    Jareborg, Niclas
    Liljedahl, Ulrika
    Jonasson, Inger
    Johansson, Asa
    Feuk, Lars
    Lundeberg, Joakim
    Syvanen, Ann-Christine
    Lundin, Sverker
    Nilsson, Daniel
    Nystedt, Bjorn
    Magnusson, Patrik K. E.
    Gyllensten, Ulf
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2017, 25 (11) : 1253 - 1260
  • [3] Andrews S., 2010, FastQC: A Quality Control Tool for High Throughput Sequence Data Online, DOI DOI 10.1186/1472-6963-10-122
  • [4] Distribution of genetic diversity reveals colonization patterns and philopatry of the loggerhead sea turtles across geographic scales
    Baltazar-Soares, Miguel
    Klein, Juliana D.
    Correia, Sandra M.
    Reischig, Thomas
    Taxonera, Albert
    Roque, Silvana Monteiro
    Dos Passos, Leno
    Durao, Jandira
    Lomba, Joao Pina
    Dinis, Herculano
    Cameron, Sahmorie J. K.
    Stiebens, Victor A.
    Eizaguirre, Christophe
    [J]. SCIENTIFIC REPORTS, 2020, 10 (01)
  • [5] BamTools: a C++ API and toolkit for analyzing and managing BAM files
    Barnett, Derek W.
    Garrison, Erik K.
    Quinlan, Aaron R.
    Stroemberg, Michael P.
    Marth, Gabor T.
    [J]. BIOINFORMATICS, 2011, 27 (12) : 1691 - 1692
  • [6] Fitting Linear Mixed-Effects Models Using lme4
    Bates, Douglas
    Maechler, Martin
    Bolker, Benjamin M.
    Walker, Steven C.
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2015, 67 (01): : 1 - 48
  • [7] Evolutionary origins of genomic adaptations in an invasive copepod
    Ben Stern, David
    Lee, Carol Eunmi
    [J]. NATURE ECOLOGY & EVOLUTION, 2020, 4 (08) : 1084 - +
  • [8] De Novo Sequencing, Assembly, and Annotation of Four Threespine Stickleback Genomes Based on Microfluidic Partitioned DNA Libraries
    Berner, Daniel
    Roesti, Marius
    Bilobram, Steven
    Chan, Simon K.
    Kirk, Heather
    Pandoh, Pawan
    Taylor, Gregory A.
    Zhao, Yongjun
    Jones, Steven J. M.
    DeFaveri, Jacquelin
    [J]. GENES, 2019, 10 (06):
  • [9] Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets
    Bohling, Justin
    [J]. ECOLOGY AND EVOLUTION, 2020, 10 (14): : 7585 - 7601
  • [10] Trimmomatic: a flexible trimmer for Illumina sequence data
    Bolger, Anthony M.
    Lohse, Marc
    Usadel, Bjoern
    [J]. BIOINFORMATICS, 2014, 30 (15) : 2114 - 2120