On the Use of the Pearson Correlation Coefficient for Model Evaluation in Genome-Wide Prediction

被引:49
作者
Waldmann, Patrik [1 ]
机构
[1] Swedish Univ Agr Sci, Dept Anim Breeding & Genet, SLU, Uppsala, Sweden
关键词
genomic selection; model comparison; accuracy; bias-variance trade-off; coefficient of determination; MEAN-SQUARE ERROR; REGRESSION; CHALLENGES; SELECTION; LASSO;
D O I
10.3389/fgene.2019.00899
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The large number of markers in genome-wide prediction demands the use of methods with regularization and model comparison based on some hold-out test prediction error measure. In quantitative genetics, it is common practice to calculate the Pearson correlation coefficient (r(2)) as a standardized measure of the predictive accuracy of a model. Based on arguments from the bias-variance trade-off theory in statistical learning, we show that shrinkage of the regression coefficients (i.e., QTL effects) reduces the prediction mean squared error (MSE) by introducing model bias compared with the ordinary least squares method. We also show that the LASSO and the adaptive LASSO (ALASSO) can reduce the model bias and prediction MSE by adding model variance. In an application of ridge regression, the LASSO and ALASSO to a simulated example based on results for 9,723 SNPs and 3,226 individuals, the best model selected was with the LASSO when r(2) was used as a measure. However, when model selection was based on test MSE and coefficient of determination R-2 the ALASSO proved to be the best method. Hence, use of r(2) may lead to selection of the wrong model and therefore also nonoptimal ranking of phenotype predictions and genomic breeding values. Instead, we propose use of the test MSE for model selection and R-2 as a standardized measure of the accuracy.
引用
收藏
页数:4
相关论文
共 29 条
  • [1] [Anonymous], 1998, GENETICS ANAL QUANTI
  • [2] Casella G., 2002, STAT INFERENCE
  • [3] Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach
    Daetwyler, Hans D.
    Villanueva, Beatriz
    Woolliams, John A.
    [J]. PLOS ONE, 2008, 3 (10):
  • [4] Genomic Heritability: What Is It?
    de los Campos, Gustavo
    Sorensen, Daniel
    Gianola, Daniel
    [J]. PLOS GENETICS, 2015, 11 (05):
  • [5] Challenges of Big Data analysis
    Fan, Jianqing
    Han, Fang
    Liu, Han
    [J]. NATIONAL SCIENCE REVIEW, 2014, 1 (02) : 293 - 314
  • [6] FAREBROTHER RW, 1976, J ROY STAT SOC B MET, V38, P248
  • [7] Regularization Paths for Generalized Linear Models via Coordinate Descent
    Friedman, Jerome
    Hastie, Trevor
    Tibshirani, Rob
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01): : 1 - 22
  • [8] Cross-Validation Without Doing Cross-Validation in Genome-Enabled Prediction
    Gianola, Daniel
    Schoen, Chris-Carolin
    [J]. G3-GENES GENOMES GENETICS, 2016, 6 (10): : 3107 - 3128
  • [9] Giraud C., 2015, INTRO HIGH DIMENSION
  • [10] Genomic selection: prediction of accuracy and maximisation of long term response
    Goddard, Mike
    [J]. GENETICA, 2009, 136 (02) : 245 - 257