Sample size determination for training set optimization in genomic prediction

被引:9
作者
Wu, Po-Ya [1 ,2 ]
Ou, Jen-Hsiang [1 ,3 ]
Liao, Chen-Tuo [1 ]
机构
[1] Natl Taiwan Univ, Dept Agron, Taipei, Taiwan
[2] Heinrich Heine Univ, Inst Quant Genet & Genom Plants, Dusseldorf, Germany
[3] Uppsala Univ, Dept Med Biochem & Microbiol, Uppsala, Sweden
关键词
CALIBRATION SET; LINEAR-MODELS; SELECTION; ACCURACY; INDIVIDUALS; REGRESSION; PRECISION;
D O I
10.1007/s00122-023-04254-9
中图分类号
S3 [农学(农艺学)];
学科分类号
0901 ;
摘要
Genomic prediction (GP) is a statistical method used to select quantitative traits in animal or plant breeding. For this purpose, a statistical prediction model is first built that uses phenotypic and genotypic data in a training set. The trained model is then used to predict genomic estimated breeding values (GEBVs) for individuals within a breeding population. Setting the sample size of the training set usually takes into account time and space constraints that are inevitable in an agricultural experiment. However, the determination of the sample size remains an unresolved issue for a GP study. By applying the logistic growth curve to identify prediction accuracy for the GEBVs and the training set size, a practical approach was developed to determine a cost-effective optimal training set for a given genome dataset with known genotypic data. Three real genome datasets were used to illustrate the proposed approach. An R function is provided to facilitate widespread application of this approach to sample size determination, which can help breeders to identify a set of genotypes with an economical sample size for selective phenotyping.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Environment-specific genomic prediction ability in maize using environmental covariates depends on environmental similarity to training data
    Rogers, Anna R.
    Holland, James B.
    [J]. G3-GENES GENOMES GENETICS, 2022, 12 (02):
  • [42] Design and sample size determination for multiple-dose randomized phase II trials for dose optimization
    Yang, Peng
    Li, Daniel
    Lin, Ruitao
    Huang, Bo
    Yuan, Ying
    [J]. STATISTICS IN MEDICINE, 2024, 43 (15) : 2972 - 2986
  • [43] Genomic prediction model optimization for growth traits of olive flounder (Paralichthys olivaceus)
    Omeka, W. K. M.
    Liyanage, D. S.
    Lee, Sukkyoung
    Udayantha, H. M. V.
    Kim, Gaeun
    Ganeshalingam, Subothini
    Jeong, Taehyug
    Jones, David B.
    Massault, Cecile
    Jerry, Dean R.
    Lee, Jehee
    [J]. AQUACULTURE REPORTS, 2024, 36
  • [44] Optimization of genomic selection training populations with a genetic algorithm
    Akdemir, Deniz
    Sanchez, Julio I.
    Jannink, Jean-Luc
    [J]. GENETICS SELECTION EVOLUTION, 2015, 47
  • [45] Development of a Genomic Prediction Pipeline for Maintaining Comparable Sample Sizes in Training and Testing Sets across Prediction Schemes Accounting for the Genotype-by-Environment Interaction
    Persa, Reyna
    Grondona, Martin
    Jarquin, Diego
    [J]. AGRICULTURE-BASEL, 2021, 11 (10):
  • [46] Designing optimal training sets for genomic prediction using adversarial validation with probit regression
    Montesinos-Lopez, Osval
    Kismiantini
    Montesinos-Lopez, Abelardo
    [J]. PLANT BREEDING, 2023, 142 (05) : 594 - 606
  • [47] Effects of number of training generations on genomic prediction for various traits in a layer chicken population
    Weng, Ziqing
    Wolc, Anna
    Shen, Xia
    Fernando, Rohan L.
    Dekkers, Jack C. M.
    Arango, Jesus
    Settar, Petek
    Fulton, Janet E.
    O'Sullivan, Neil P.
    Garrick, Dorian J.
    [J]. GENETICS SELECTION EVOLUTION, 2016, 48
  • [48] Assessing the performance of a diatom transfer function on four Minnesota lake sediment cores: effects of training set size and sample age
    Reavie, Euan D.
    Edlund, Mark B.
    [J]. JOURNAL OF PALEOLIMNOLOGY, 2013, 50 (01) : 87 - 104
  • [49] Shrinkage estimation of the genomic relationship matrix can improve genomic estimated breeding values in the training set
    Mueller, Dominik
    Technow, Frank
    Melchinger, Albrecht E.
    [J]. THEORETICAL AND APPLIED GENETICS, 2015, 128 (04) : 693 - 703
  • [50] The effects of training population design on genomic prediction accuracy in wheat
    Edwards, Stefan McKinnon
    Buntjer, Jaap B.
    Jackson, Robert
    Bentley, Alison R.
    Lage, Jacob
    Byrne, Ed
    Burt, Chris
    Jack, Peter
    Berry, Simon
    Flatman, Edward
    Poupard, Bruno
    Smith, Stephen
    Hayes, Charlotte
    Gaynor, R. Chris
    Gorjanc, Gregor
    Howell, Phil
    Ober, Eric
    Mackay, Ian J.
    Hickey, John M.
    [J]. THEORETICAL AND APPLIED GENETICS, 2019, 132 (07) : 1943 - 1952