Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions

被引:0
作者
T Druet
I M Macleod
B J Hayes
机构
[1] Unit of Animal Genomics,
[2] Faculty of Veterinary Medicine and Centre for Biomedical Integrative Genoproteomics,undefined
[3] University of Liège,undefined
[4] Faculty of Land and Food Resources,undefined
[5] University of Melbourne,undefined
[6] La Trobe University,undefined
[7] Biosciences Research Division,undefined
[8] Department of Primary Industries,undefined
[9] Dairy Futures Cooperative Research Centre,undefined
来源
Heredity | 2014年 / 112卷
关键词
genome sequencing; genomic prediction; accuracy;
D O I
暂无
中图分类号
学科分类号
摘要
Genomic prediction from whole-genome sequence data is attractive, as the accuracy of genomic prediction is no longer bounded by extent of linkage disequilibrium between DNA markers and causal mutations affecting the trait, given the causal mutations are in the data set. A cost-effective strategy could be to sequence a small proportion of the population, and impute sequence data to the rest of the reference population. Here, we describe strategies for selecting individuals for sequencing, based on either pedigree relationships or haplotype diversity. Performance of these strategies (number of variants detected and accuracy of imputation) were evaluated in sequence data simulated through a real Belgian Blue cattle pedigree. A strategy (AHAP), which selected a subset of individuals for sequencing that maximized the number of unique haplotypes (from single-nucleotide polymorphism panel data) sequenced gave good performance across a range of variant minor allele frequencies. We then investigated the optimum number of individuals to sequence by fold coverage given a maximum total sequencing effort. At 600 total fold coverage (x 600), the optimum strategy was to sequence 75 individuals at eightfold coverage. Finally, we investigated the accuracy of genomic predictions that could be achieved. The advantage of using imputed sequence data compared with dense SNP array genotypes was highly dependent on the allele frequency spectrum of the causative mutations affecting the trait. When this followed a neutral distribution, the advantage of the imputed sequence data was small; however, when the causal mutations all had low minor allele frequencies, using the sequence data improved the accuracy of genomic prediction by up to 30%.
引用
收藏
页码:39 / 47
页数:8
相关论文
共 123 条
[1]  
Bovine HapMap Consortium(2009)Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds Science 324 528-532
[2]  
Gibbs RA(2007)Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering Am J Hum Genet 81 1084-1097
[3]  
Taylor JF(2008)Fregene: simulation of realistic sequence-level data in populations and ascertained samples BMC Bioinform 9 364-27
[4]  
Van Tassell CP(2011)Different models of genetic variation and their effect on genomic evaluation Genet Sel Evol 43 18-3384
[5]  
Barendse W(2012)Components of the accuracy of genomic prediction in a multi-breed sheep population J Anim Sci 90 3375-798
[6]  
Eversole W(2009)No bull:genes for better milk Nature 457 369-5454
[7]  
Browning SR(2010)A hidden markov model combining linkage and linkage disequilibrium information for haplotype reconstruction and quantitative trait locus fine mapping Genetics 184 789-4129
[8]  
Browning BL(2010)Imputation of genotypes from different single nucleotide polymorphism panels in dairy cattle J Dairy Sci 93 5443-257
[9]  
Chadeau-Hyam M(2012)Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels J Dairy Sci 95 4114-31
[10]  
Hoggart CJ(2009)Genomic selection: prediction of accuracy and maximisation of long term response Genetica 136 245-75