Regularized multi-trait multi-locus linear mixed models for genome-wide association studies and genomic selection in crops

被引:2
作者
Lozano, Aurelie C. [1 ]
Ding, Hantian [2 ]
Abe, Naoki [1 ]
Lipka, Alexander E. [3 ]
机构
[1] IBM TJ Watson Res Ctr, IBM Res, Yorktown Hts, NY USA
[2] Univ Penn, Philadelphia, PA USA
[3] Univ Illinois, Dept Crop Sci, Champaign, IL 61820 USA
关键词
Multi-trait multi-locus linear mixed model; GWAS and genomic selection in plants; Regularization; GENETIC-HETEROGENEITY; VARIABLE SELECTION; POPULATION; PREDICTION; REGRESSION;
D O I
10.1186/s12859-023-05519-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundWe consider two key problems in genomics involving multiple traits: multi-trait genome wide association studies (GWAS), where the goal is to detect genetic variants associated with the traits; and multi-trait genomic selection (GS), where the emphasis is on accurately predicting trait values. Multi-trait linear mixed models build on the linear mixed model to jointly model multiple traits. Existing estimation methods, however, are limited to the joint analysis of a small number of genotypes; in fact, most approaches consider one SNP at a time. Estimating multi-dimensional genetic and environment effects also results in considerable computational burden. Efficient approaches that incorporate regularization into multi-trait linear models (no random effects) have been recently proposed to identify genomic loci associated with multiple traits (Yu et al. in Multitask learning using task clustering with applications to predictive modeling and GWAS of plant varieties. arXiv:1710.01788, 2017; Yu et al in Front Big Data 2:27, 2019), but these ignore population structure and familial relatedness (Yu et al in Nat Genet 38:203-208, 2006).ResultsThis work addresses this gap by proposing a novel class of regularized multi-trait linear mixed models along with scalable approaches for estimation in the presence of high-dimensional genotypes and a large number of traits. We evaluate the effectiveness of the proposed methods using datasets in maize and sorghum diversity panels, and demonstrate benefits in both achieving high prediction accuracy in GS and in identifying relevant marker-trait associations.ConclusionsThe proposed regularized multivariate linear mixed models are relevant for both GWAS and GS. We hope that they will facilitate agronomy-related research in plant biology and crop breeding endeavors.
引用
收藏
页数:15
相关论文
共 47 条
  • [1] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [2] Increased Power To Dissect Adaptive Traits in Global Sorghum Diversity Using a Nested Association Mapping Population
    Bouchet, Sophie
    Olatoye, Marcus O.
    Marla, Sandeep R.
    Perumal, Ramasamy
    Tesso, Tesfaye
    Yu, Jianming
    Tuinstra, Mitch
    Morris, Geoffrey P.
    [J]. GENETICS, 2017, 206 (02) : 573 - 585
  • [3] Efficient mapping of plant height quantitative trait loci in a sorghum association population with introgressed dwarfing genes
    Brown, Patrick J.
    Rooney, William L.
    Franks, Cleve
    Kresovich, Stephen
    [J]. GENETICS, 2008, 180 (01) : 629 - 637
  • [4] Genomic Prediction from Multiple-Trait Bayesian Regression Methods Using Mixture Priors
    Cheng, Hao
    Kizilkaya, Kadir
    Zeng, Jian
    Garrick, Dorian
    Fernando, Rohan
    [J]. GENETICS, 2018, 209 (01) : 89 - 103
  • [5] Incorporating Genetic Heterogeneity in Whole-Genome Regressions Using Interactions
    de los Campos, Gustavo
    Veturi, Yogasudha
    Vazquez, Ana I.
    Lehermeier, Christina
    Perez-Rodriguez, Paulino
    [J]. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2015, 20 (04) : 467 - 490
  • [6] A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species
    Elshire, Robert J.
    Glaubitz, Jeffrey C.
    Sun, Qi
    Poland, Jesse A.
    Kawamoto, Ken
    Buckler, Edward S.
    Mitchell, Sharon E.
    [J]. PLOS ONE, 2011, 6 (05):
  • [7] Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP
    Endelman, Jeffrey B.
    [J]. PLANT GENOME, 2011, 4 (03): : 250 - 255
  • [8] Variable selection via nonconcave penalized likelihood and its oracle properties
    Fan, JQ
    Li, RZ
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) : 1348 - 1360
  • [9] VARIABLE SELECTION IN LINEAR MIXED EFFECTS MODELS
    Fan, Yingying
    Li, Runze
    [J]. ANNALS OF STATISTICS, 2012, 40 (04) : 2043 - 2068
  • [10] Maize association population: a high-resolution platform for quantitative trait locus dissection
    Flint-Garcia, SA
    Thuillet, AC
    Yu, JM
    Pressoir, G
    Romero, SM
    Mitchell, SE
    Doebley, J
    Kresovich, S
    Goodman, MM
    Buckler, ES
    [J]. PLANT JOURNAL, 2005, 44 (06) : 1054 - 1064