Systematic bias of correlation coefficient may explain negative accuracy of genomic prediction

被引:14
作者
Zhou, Yao [1 ,2 ]
Vales, M. Isabel [2 ]
Wang, Aoxue [1 ]
Zhang, Zhiwu [2 ]
机构
[1] Northeast Agr Univ, Coll Life Sci, Harbin, Heilongjiang, Peoples R China
[2] Washington State Univ, Dept Crop & Soil Sci, Pullman, WA 99164 USA
基金
美国食品与农业研究所;
关键词
genomic selection; genomic prediction; Pearson correlation; accuracy; cross-validation; HYBRID PERFORMANCE; BREEDING PROGRAMS; CROSS-VALIDATION; SELECTION; ASSOCIATION; SIMULATION; ANIMALS;
D O I
10.1093/bib/bbw064
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Accuracy of genomic prediction is commonly calculated as the Pearson correlation coefficient between the predicted and observed phenotypes in the inference population by using cross-validation analysis. More frequently than expected, significant negative accuracies of genomic prediction have been reported in genomic selection studies. These negative values are surprising, given that the minimum value for prediction accuracy should hover around zero when randomly permuted data sets are analyzed. We reviewed the two common approaches for calculating the Pearson correlation and hypothesized that these negative accuracy values reflect potential bias owing to artifacts caused by the mathematical formulas used to calculate prediction accuracy. The first approach, Instant accuracy, calculates correlations for each fold and reports prediction accuracy as the mean of correlations across fold. The other approach, Hold accuracy, predicts all phenotypes in all fold and calculates correlation between the observed and predicted phenotypes at the end of the cross-validation process. Using simulated and real data, we demonstrated that our hypothesis is true. Both approaches are biased downward under certain conditions. The biases become larger when more fold are employed and when the expected accuracy is low. The bias of Instant accuracy can be corrected using a modified formula.
引用
收藏
页码:744 / 753
页数:10
相关论文
共 37 条
  • [1] Genome-based prediction of maize hybrid performance across genetic groups, testers, locations, and years
    Albrecht, Theresa
    Auinger, Hans-Juergen
    Wimmer, Valentin
    Ogutu, Joseph O.
    Knaak, Carsten
    Ouzunova, Milena
    Piepho, Hans-Peter
    Schoen, Chris-Carolin
    [J]. THEORETICAL AND APPLIED GENETICS, 2014, 127 (06) : 1375 - 1386
  • [2] Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines
    Atwell, Susanna
    Huang, Yu S.
    Vilhjalmsson, Bjarni J.
    Willems, Glenda
    Horton, Matthew
    Li, Yan
    Meng, Dazhe
    Platt, Alexander
    Tarone, Aaron M.
    Hu, Tina T.
    Jiang, Rong
    Muliyati, N. Wayan
    Zhang, Xu
    Amer, Muhammad Ali
    Baxter, Ivan
    Brachi, Benjamin
    Chory, Joanne
    Dean, Caroline
    Debieu, Marilyne
    de Meaux, Juliette
    Ecker, Joseph R.
    Faure, Nathalie
    Kniskern, Joel M.
    Jones, Jonathan D. G.
    Michael, Todd
    Nemri, Adnane
    Roux, Fabrice
    Salt, David E.
    Tang, Chunlao
    Todesco, Marco
    Traw, M. Brian
    Weigel, Detlef
    Marjoram, Paul
    Borevitz, Justin O.
    Bergelson, Joy
    Nordborg, Magnus
    [J]. NATURE, 2010, 465 (7298) : 627 - 631
  • [3] Assessing the accuracy of prediction algorithms for classification: an overview
    Baldi, P
    Brunak, S
    Chauvin, Y
    Andersen, CAF
    Nielsen, H
    [J]. BIOINFORMATICS, 2000, 16 (05) : 412 - 424
  • [4] Cross-validation methods
    Browne, MW
    [J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2000, 44 (01) : 108 - 132
  • [5] Chen RL, 2012, PLOS ONE, V7, DOI [10.1371/journal.pone.0050957, 10.1371/journal.pone.0051246]
  • [6] Genetic Architecture of Maize Kernel Composition in the Nested Association Mapping and Inbred Association Panels
    Cook, Jason P.
    McMullen, Michael D.
    Holland, James B.
    Tian, Feng
    Bradbury, Peter
    Ross-Ibarra, Jeffrey
    Buckler, Edward S.
    Flint-Garcia, Sherry A.
    [J]. PLANT PHYSIOLOGY, 2012, 158 (02) : 824 - 834
  • [7] Genomic Prediction in Animals and Plants: Simulation of Data, Validation, Reporting, and Benchmarking
    Daetwyler, Hans D.
    Calus, Mario P. L.
    Pong-Wong, Ricardo
    de los Campos, Gustavo
    Hickey, John M.
    [J]. GENETICS, 2013, 193 (02) : 347 - +
  • [8] Predicting genetic predisposition in humans: the promise of whole-genome markers
    de los Campos, Gustavo
    Gianola, Daniel
    Allison, David B.
    [J]. NATURE REVIEWS GENETICS, 2010, 11 (12) : 880 - 886
  • [9] Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP
    Endelman, Jeffrey B.
    [J]. PLANT GENOME, 2011, 4 (03): : 250 - 255
  • [10] Fisher RA, 1914, BIOMETRIKA, V10, P507