Improving the efficiency of genomic selection

被引:17
作者
Scutari, Marco [1 ]
Mackay, Ian [2 ]
Balding, David [1 ]
机构
[1] UCL, Genet Inst, London WC1E 6BT, England
[2] NIAB, Cambridge, England
基金
英国生物技术与生命科学研究理事会; 英国工程与自然科学研究理事会;
关键词
genome-wide prediction; genomic selection; feature selection; Markov blanket; linkage disequilibrium; kinship; RIDGE-REGRESSION; WIDE ASSOCIATION; VARIABLE SELECTION; COMPLEX TRAITS; PREDICTION; REGULARIZATION; INFORMATION; RISK;
D O I
10.1515/sagmb-2013-0002
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We investigate two approaches to increase the efficiency of phenotypic prediction from genome-wide markers, which is a key step for genomic selection (GS) in plant and animal breeding. The first approach is feature selection based on Markov blankets, which provide a theoretically-sound framework for identifying non-informative markers. Fitting GS models using only the informative markers results in simpler models, which may allow cost savings from reduced genotyping. We show that this is accompanied by no loss, and possibly a small gain, in predictive power for four GS models: partial least squares (PLS), ridge regression, LASSO and elastic net. The second approach is the choice of kinship coefficients for genomic best linear unbiased prediction (GBLUP). We compare kinships based on different combinations of centring and scaling of marker genotypes, and a newly proposed kinship measure that adjusts for linkage disequilibrium (LD). We illustrate the use of both approaches and examine their performances using three real-world data sets with continuous phenotypic traits from plant and animal genetics. We find that elastic net with feature selection and GBLUP using LD-adjusted kinships performed similarly well, and were the best-performing methods in our study.
引用
收藏
页码:517 / 527
页数:11
相关论文
共 43 条
[1]  
Aliferis CF, 2010, J MACH LEARN RES, V11, P171
[2]   Population Structure and Cryptic Relatedness in Genetic Association Studies [J].
Astle, William ;
Balding, David J. .
STATISTICAL SCIENCE, 2009, 24 (04) :451-471
[3]   Examining the relative influence of familial, genetic, and environmental covariate information in flexible risk models [J].
Bravo, Hector Corrada ;
Lee, Kristine E. ;
Klein, Barbara E. K. ;
Klein, Ronald ;
Iyengar, Sudha K. ;
Wahba, Grace .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (20) :8128-8133
[4]   Genome-wide association mapping to candidate polymorphism resolution in the unsequenced barley genome [J].
Cockram, James ;
White, Jon ;
Zuluaga, Diana L. ;
Smith, David ;
Comadran, Jordi ;
Macaulay, Malcolm ;
Luo, Zewei ;
Kearsey, Mike J. ;
Werner, Peter ;
Harrap, David ;
Tapsell, Chris ;
Liu, Hui ;
Hedley, Peter E. ;
Stein, Nils ;
Schulte, Daniela ;
Steuernagel, Burkhard ;
Marshall, David F. ;
Thomas, William T. B. ;
Ramsay, Luke ;
Mackay, Ian ;
Balding, David J. ;
Waugh, Robbie ;
O'Sullivan, Donal M. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (50) :21611-21616
[5]   Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding [J].
de los Campos, Gustavo ;
Hickey, John M. ;
Pong-Wong, Ricardo ;
Daetwyler, Hans D. ;
Calus, Mario P. L. .
GENETICS, 2013, 193 (02) :327-+
[6]   Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information [J].
Forni, Selma ;
Aguilar, Ignacio ;
Misztal, Ignacy .
GENETICS SELECTION EVOLUTION, 2011, 43
[7]   Regularization Paths for Generalized Linear Models via Coordinate Descent [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Rob .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01) :1-22
[8]   Additive Genetic Variability and the Bayesian Alphabet [J].
Gianola, Daniel ;
de los Campos, Gustavo ;
Hill, William G. ;
Manfredi, Eduardo ;
Fernando, Rohan .
GENETICS, 2009, 183 (01) :347-363
[9]  
Goeman JelleJ., 2012, Penalized R package
[10]   BAYESIAN VARIABLE SELECTION REGRESSION FOR GENOME-WIDE ASSOCIATION STUDIES AND OTHER LARGE-SCALE PROBLEMS [J].
Guan, Yongtao ;
Stephens, Matthew .
ANNALS OF APPLIED STATISTICS, 2011, 5 (03) :1780-1815