Sample size determination for training set optimization in genomic prediction

被引:9
作者
Wu, Po-Ya [1 ,2 ]
Ou, Jen-Hsiang [1 ,3 ]
Liao, Chen-Tuo [1 ]
机构
[1] Natl Taiwan Univ, Dept Agron, Taipei, Taiwan
[2] Heinrich Heine Univ, Inst Quant Genet & Genom Plants, Dusseldorf, Germany
[3] Uppsala Univ, Dept Med Biochem & Microbiol, Uppsala, Sweden
关键词
CALIBRATION SET; LINEAR-MODELS; SELECTION; ACCURACY; INDIVIDUALS; REGRESSION; PRECISION;
D O I
10.1007/s00122-023-04254-9
中图分类号
S3 [农学(农艺学)];
学科分类号
0901 ;
摘要
Genomic prediction (GP) is a statistical method used to select quantitative traits in animal or plant breeding. For this purpose, a statistical prediction model is first built that uses phenotypic and genotypic data in a training set. The trained model is then used to predict genomic estimated breeding values (GEBVs) for individuals within a breeding population. Setting the sample size of the training set usually takes into account time and space constraints that are inevitable in an agricultural experiment. However, the determination of the sample size remains an unresolved issue for a GP study. By applying the logistic growth curve to identify prediction accuracy for the GEBVs and the training set size, a practical approach was developed to determine a cost-effective optimal training set for a given genome dataset with known genotypic data. Three real genome datasets were used to illustrate the proposed approach. An R function is provided to facilitate widespread application of this approach to sample size determination, which can help breeders to identify a set of genotypes with an economical sample size for selective phenotyping.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Within-family genomic selection in strawberry: Optimization of marker density, trial design, and training set composition
    Sleper, Joshua
    Tapia, Ronald
    Lee, Seonghee
    Whitaker, Vance
    PLANT GENOME, 2025, 18 (01)
  • [32] Interactive Prior Elicitation of Feature Similarities for Small Sample Size Prediction
    Afrabandpey, Homayun
    Peltola, Tomi
    Kaski, Samuel
    PROCEEDINGS OF THE 25TH CONFERENCE ON USER MODELING, ADAPTATION AND PERSONALIZATION (UMAP'17), 2017, : 265 - 269
  • [33] Genomic prediction and quantitative trait locus discovery in a cassava training population constructed from multiple breeding stages
    Somo, Mohamed
    Kulembeka, Heneriko
    Mtunda, Kiddo
    Mrema, Emmanuel
    Salum, Kasele
    Wolfe, Marnin D.
    Rabbi, Ismail Y.
    Egesi, Chiedozie
    Kawuki, Robert
    Ozimati, Alfred
    Lozano, Roberto
    Jannink, Jean-Luc
    CROP SCIENCE, 2020, 60 (02) : 896 - 913
  • [34] Genomic prediction using composite training sets is an effective method for exploiting germplasm conserved in rice gene banks
    He, Sang
    Liu, Hongyan
    Zhan, Junhui
    Meng, Yun
    Wang, Yamei
    Wang, Feng
    Ye, Guoyou
    CROP JOURNAL, 2022, 10 (04): : 1073 - 1082
  • [35] Improving Genomic Prediction with Machine Learning Incorporating TPE for Hyperparameters Optimization
    Liang, Mang
    An, Bingxing
    Li, Keanning
    Du, Lili
    Deng, Tianyu
    Cao, Sheng
    Du, Yueying
    Xu, Lingyang
    Gao, Xue
    Zhang, Lupei
    Li, Junya
    Gao, Huijiang
    BIOLOGY-BASEL, 2022, 11 (11):
  • [36] Bayesian Sample Size Determination for Causal Discovery
    Castelletti, Federico
    Consonni, Guido
    STATISTICAL SCIENCE, 2024, 39 (02) : 305 - 321
  • [37] Sample size determination for clustered count data
    Amatya, Anup
    Bhaumik, Dulal
    Gibbons, Robert D.
    STATISTICS IN MEDICINE, 2013, 32 (24) : 4162 - 4179
  • [38] Understanding Sample Size Determination in Nursing Research
    Hayat, Matthew J.
    WESTERN JOURNAL OF NURSING RESEARCH, 2013, 35 (07) : 943 - 956
  • [39] Sample size determination: posterior distributions proximity
    Kiselev, Nikita
    Grabovoy, Andrey
    COMPUTATIONAL MANAGEMENT SCIENCE, 2025, 22 (01)
  • [40] A Modified Bayesian Optimization Approach for Determining a Training Set to Identify the Best Genotypes from a Candidate Population in Genomic Selection
    Tu, Hui-Ning
    Liao, Chen-Tuo
    JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2024,