Scanning and Filling: Ultra-Dense SNP Genotyping Combining Genotyping-By-Sequencing, SNP Array and Whole-Genome Resequencing Data

被引:53
作者
Torkamaneh, Davoud
Belzile, Francois [1 ]
机构
[1] Univ Laval, Dept Phytol, Quebec City, PQ G1K 7P4, Canada
关键词
GENETIC DIVERSITY ANALYSIS; LINKAGE DISEQUILIBRIUM; WIDE ASSOCIATION; IMPUTATION; PREDICTION; MARKERS; SELECTION; ACCURACY; PATTERNS; COVERAGE;
D O I
10.1371/journal.pone.0131533
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Genotyping-by-sequencing (GBS) represents a highly cost-effective high-throughput geno-typing approach. By nature, however, GBS is subject to generating sizeable amounts of missing data and these will need to be imputed for many downstream analyses. The extent to which such missing data can be tolerated in calling SNPs has not been explored widely. In this work, we first explore the use of imputation to fill in missing genotypes in GBS data-sets. Importantly, we use whole genome resequencing data to assess the accuracy of the imputed data. Using a panel of 301 soybean accessions, we show that over 62,000 SNPs could be called when tolerating up to 80% missing data, a five-fold increase over the number called when tolerating up to 20% missing data. At all levels of missing data examined (between 20% and 80%), the resulting SNP datasets were of uniformly high accuracy (96-98%). We then used imputation to combine complementary SNP datasets derived from GBS and a SNP array (SoySNP50K). We thus produced an enhanced dataset of >100,000 SNPs and the genotypes at the previously untyped loci were again imputed with a high level of accuracy (95%). Of the >4,000,000 SNPs identified through resequencing 23 accessions (among the 301 used in the GBS analysis), 1.4 million tag SNPs were used as a reference to impute this large set of SNPs on the entire panel of 301 accessions. These previously untyped loci could be imputed with around 90% accuracy. Finally, we used the 100K SNP dataset (GBS + SoySNP50K) to perform a GWAS on seed oil content within this collection of soybean accessions. Both the number of significant marker-trait associations and the peak significance levels were improved considerably using this enhanced catalog of SNPs relative to a smaller catalog resulting from GBS alone at <= 20% missing data. Our results demonstrate that imputation can be used to fill in both missing genotypes and untyped loci with very high accuracy and that this leads to more powerful genetic analyses.
引用
收藏
页数:16
相关论文
共 48 条
[1]   Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing [J].
Aflitos, Saulo ;
Schijlen, Elio ;
de Jong, Hans ;
de Ridder, Dick ;
Smit, Sandra ;
Finkers, Richard ;
Wang, Jun ;
Zhang, Gengyun ;
Li, Ning ;
Mao, Likai ;
Bakker, Freek ;
Dirks, Rob ;
Breit, Timo ;
Gravendeel, Barbara ;
Huits, Henk ;
Struss, Darush ;
Swanson-Wagner, Ruth ;
van Leeuwen, Hans ;
van Ham, Roeland C. H. J. ;
Fito, Laia ;
Guignier, Laetitia ;
Sevilla, Myrna ;
Ellul, Philippe ;
Ganko, Eric ;
Kapur, Arvind ;
Reclus, Emannuel ;
de Geus, Bernard ;
van de Geest, Henri ;
te Lintel Hekkert, Bas ;
van Haarst, Jan ;
Smits, Lars ;
Koops, Andries ;
Sanchez-Perez, Gabino ;
van Heusden, Adriaan W. ;
Visser, Richard ;
Quan, Zhiwu ;
Min, Jiumeng ;
Liao, Li ;
Wang, Xiaoli ;
Wang, Guangbiao ;
Yue, Zhen ;
Yang, Xinhua ;
Xu, Na ;
Schranz, Eric ;
Smets, Erik ;
Vos, Rutger ;
Rauwerda, Johan ;
Ursem, Remco ;
Schuit, Cees ;
Kerns, Mike .
PLANT JOURNAL, 2014, 80 (01) :136-148
[2]   Patterns of linkage disequilibrium in the human genome [J].
Ardlie, KG ;
Kruglyak, L ;
Seielstad, M .
NATURE REVIEWS GENETICS, 2002, 3 (04) :299-309
[3]   Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering [J].
Browning, Sharon R. ;
Browning, Brian L. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) :1084-1097
[4]   Genomic Prediction in Maize Breeding Populations with Genotyping-by-Sequencing [J].
Crossa, Jose ;
Beyene, Yoseph ;
Kassa, Semagn ;
Perez, Paulino ;
Hickey, John M. ;
Chen, Charles ;
de los Campos, Gustavo ;
Burgueno, Juan ;
Windhausen, Vanessa S. ;
Buckler, Ed ;
Jannink, Jean-Luc ;
Lopez Cruz, Marco A. ;
Babu, Raman .
G3-GENES GENOMES GENETICS, 2013, 3 (11) :1903-1926
[5]   Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle [J].
Daetwyler, Hans D. ;
Capitan, Aurelien ;
Pausch, Hubert ;
Stothard, Paul ;
Van Binsbergen, Rianne ;
Brondum, Rasmus F. ;
Liao, Xiaoping ;
Djari, Anis ;
Rodriguez, Sabrina C. ;
Grohs, Cecile ;
Esquerre, Diane ;
Bouchez, Olivier ;
Rossignol, Marie-Noelle ;
Klopp, Christophe ;
Rocha, Dominique ;
Fritz, Sebastien ;
Eggen, Andre ;
Bowman, Phil J. ;
Coote, David ;
Chamberlain, Amanda J. ;
Anderson, Charlotte ;
VanTassell, Curt P. ;
Hulsegge, Ina ;
Goddard, Mike E. ;
Guldbrandtsen, Bernt ;
Lund, Mogens S. ;
Veerkamp, Roel F. ;
Boichard, Didier A. ;
Fries, Ruedi ;
Hayes, Ben J. .
NATURE GENETICS, 2014, 46 (08) :858-865
[6]   Genome-wide genetic marker discovery and genotyping using next-generation sequencing [J].
Davey, John W. ;
Hohenlohe, Paul A. ;
Etter, Paul D. ;
Boone, Jason Q. ;
Catchen, Julian M. ;
Blaxter, Mark L. .
NATURE REVIEWS GENETICS, 2011, 12 (07) :499-510
[7]   Genotyping-by-Sequencing (GBS): A Novel, Efficient and Cost-Effective Genotyping Method for Cattle Using Next-Generation Sequencing [J].
De Donato, Marcos ;
Peters, Sunday O. ;
Mitchell, Sharon E. ;
Hussain, Tanveer ;
Imumorin, Ikhide G. .
PLOS ONE, 2013, 8 (05)
[8]   Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel [J].
Delaneau, Olivier ;
Marchini, Jonathan .
NATURE COMMUNICATIONS, 2014, 5
[9]   Improved whole-chromosome phasing for disease and population genetic studies [J].
Delaneau, Olivier ;
Zagury, Jean-Francois ;
Marchini, Jonathan .
NATURE METHODS, 2013, 10 (01) :5-6
[10]  
Ellinghaus David, 2009, Hum Genomics, V3, P371