Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets

被引:25
作者
Bohling, Justin [1 ]
机构
[1] US Fish & Wildlife Serv, Abernathy Fish Technol Ctr, 1440 Abernathy Creek Rd, Longview, WA 98632 USA
关键词
biodiversity genomics; conservation genomics; restriction-site associated DNA sequencing; Salmonidae; sequence alignment; BULL TROUT; COHO SALMON; GENERATION; MICROSATELLITE; HYBRIDIZATION; PHYLOGENY; ALIGNMENT; GENETICS; DESIGNS; FORMAT;
D O I
10.1002/ece3.6483
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
The advent of high-throughput sequencing (HTS) has made genomic-level analyses feasible for nonmodel organisms. A critical step of many HTS pipelines involves aligning reads to a reference genome to identify variants. Despite recent initiatives, only a fraction of species has publically available reference genomes. Therefore, a common practice is to align reads to the genome of an organism related to the target species; however, this could affect read alignment and bias genotyping. In this study, I conducted an experiment using empirical RADseq datasets generated for two species of salmonids (Actinopterygii; Teleostei; Salmonidae) to address these questions. There are currently reference genomes for six salmonids of varying phylogenetic distance. I aligned the RADseq data to all six genomes and identified variants with several different genotypers, which were then fed into population genetic analyses. Increasing phylogenetic distance between target species and reference genome reduced the proportion of reads that successfully aligned and mapping quality. Reference genome also influenced the number of SNPs that were generated and depth at those SNPs, although the affect varied by genotyper. Inferences of population structure were mixed: increasing reference genome divergence reduced estimates of differentiation but similar patterns of population relationships were found across scenarios. These findings reveal how the choice of reference genome can influence the output of bioinformatic pipelines. It also emphasizes the need to identify best practices and guidelines for the burgeoning field of biodiversity genomics.
引用
收藏
页码:7585 / 7601
页数:17
相关论文
共 63 条
[1]   Enhancements to the ADMIXTURE algorithm for individual ancestry estimation [J].
Alexander, David H. ;
Lange, Kenneth .
BMC BIOINFORMATICS, 2011, 12
[2]   Genomics and the future of conservation genetics [J].
Allendorf, Fred W. ;
Hohenlohe, Paul A. ;
Luikart, Gordon .
NATURE REVIEWS GENETICS, 2010, 11 (10) :697-709
[3]   Harnessing the power of RADseq for ecological and evolutionary genomics [J].
Andrews, Kimberly R. ;
Good, Jeffrey M. ;
Miller, Michael R. ;
Luikart, Gordon ;
Hohenlohe, Paul A. .
NATURE REVIEWS GENETICS, 2016, 17 (02) :81-92
[4]  
[Anonymous], 2012, arXiv, DOI DOI 10.48550/ARXIV.1207.3907
[5]  
[Anonymous], 2017, NEW IMPROVED RAINBOW
[6]  
[Anonymous], 2017, GRAYLING GENOME REVE
[7]  
[Anonymous], 2017, LONG READ BASED ASSE
[8]   Genetic Structure, Evolutionary History, and Conservation Units of Bull Trout in the Coterminous United States [J].
Ardren, William R. ;
DeHaan, Patrick W. ;
Smith, Christian T. ;
Taylor, Eric B. ;
Leary, Robb ;
Kozfkay, Christine C. ;
Godfrey, Lindsay ;
Diggs, Matthew ;
Fredenberg, Wade ;
Chan, Jeffrey ;
Kilpatrick, C. William ;
Small, Maureen P. ;
Hawkins, Denise K. .
TRANSACTIONS OF THE AMERICAN FISHERIES SOCIETY, 2011, 140 (02) :506-525
[9]   Whole-genome sequencing of the blue whale and other rorquals finds signatures for introgressive gene flow [J].
Arnason, Ulfur ;
Lammers, Fritjof ;
Kumar, Vikas ;
Nilsson, Maria A. ;
Janke, Axel .
SCIENCE ADVANCES, 2018, 4 (04)
[10]   RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling [J].
Arnold, B. ;
Corbett-Detig, R. B. ;
Hartl, D. ;
Bomblies, K. .
MOLECULAR ECOLOGY, 2013, 22 (11) :3179-3190