Trait genetic architecture and population structure determine model selection for genomic prediction in natural Arabidopsis thaliana populations

被引:0
|
作者
Gibbs, Patrick M. [1 ]
Paril, Jefferson F. [1 ,2 ]
Fournier-Level, Alexandre [1 ]
机构
[1] Univ Melbourne, Sch BioSci, Royal Parade, Parkville, Vic 3010, Australia
[2] La Trobe Univ, Dept Energy Environm & Climate Act, Agr Victoria Res, AgriBio, 5 Ring Rd, Bundoora, Vic 3083, Australia
关键词
Arabidopsis thaliana; penalized regression; machine learning; population structure; model selection; genomic prediction; NATIVE RANGE; REGULARIZATION; ACCURACY; PATHS; LOCI;
D O I
10.1093/genetics/iyaf003
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Genomic prediction applies to any agro- or ecologically relevant traits, with distinct ontologies and genetic architectures. Selecting the most appropriate model for the distribution of genetic effects and their associated allele frequencies in the training population is crucial. Linear regression models are often preferred for genomic prediction. However, linear models may not suit all genetic architectures and training populations. Machine learning approaches have been proposed to improve genomic prediction owing to their capacity to capture complex biology including epistasis. However, the applicability of different genomic prediction models, including non-linear, non-parametric approaches, has not been rigorously assessed across a wide variety of plant traits in natural outbreeding populations. This study evaluates genomic prediction sensitivity to trait ontology and the impact of population structure on model selection and prediction accuracy. Examining 36 quantitative traits in 1,000+ natural genotypes of the model plant Arabidopsis thaliana, we assessed the performance of penalized regression, random forest, and multilayer perceptron at producing genomic predictions. Regression models were generally the most accurate, except for biochemical traits where random forest performed best. We link this result to the genetic architecture of each trait-notably that biochemical traits have simpler genetic architecture than macroscopic traits. Moreover, complex macroscopic traits, particularly those related to flowering time and yield, were strongly correlated to population structure, while molecular traits were better predicted by fewer, independent markers. This study highlights the relevance of machine learning approaches for simple molecular traits and underscores the need to consider ancestral population history when designing training samples.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Genetic architecture of a selection response in Arabidopsis thaliana
    Ungerer, MC
    Rieseberg, LH
    EVOLUTION, 2003, 57 (11) : 2531 - 2539
  • [2] Genetic diversity of the natural populations of Arabidopsis thaliana in China
    F He
    D Kang
    Y Ren
    L-J Qu
    Y Zhen
    H Gu
    HEREDITY, 2007, 99 (04) : 423 - 431
  • [3] Genetic diversity of the natural populations of Arabidopsis thaliana in China
    F He
    D Kang
    Y Ren
    L-J Qu
    Y Zhen
    H Gu
    Heredity, 2007, 99 : 423 - 431
  • [4] The roles of genetic drift and natural selection in quantitative trait divergence along an altitudinal gradient in Arabidopsis thaliana
    Luo, Y.
    Widmer, A.
    Karrenberg, S.
    HEREDITY, 2015, 114 (02) : 220 - 228
  • [5] The roles of genetic drift and natural selection in quantitative trait divergence along an altitudinal gradient in Arabidopsis thaliana
    Y Luo
    A Widmer
    S Karrenberg
    Heredity, 2015, 114 : 220 - 228
  • [6] Genetic variability in natural populations of Arabidopsis thaliana in northern Europe
    Stenoien, HK
    Fenster, CB
    Tonteri, A
    Savolainen, O
    MOLECULAR ECOLOGY, 2005, 14 (01) : 137 - 148
  • [7] Genetic Architecture of Natural Variation of Telomere Length in Arabidopsis thaliana
    Fulcher, Nick
    Teubenbacher, Astrid
    Kerdaffrec, Envel
    Farlow, Ashley
    Nordborg, Magnus
    Riha, Karel
    GENETICS, 2015, 199 (02) : 625 - 635
  • [8] Conflicting selection on the timing of germination in a natural population of Arabidopsis thaliana
    Akiyama, R.
    Agren, J.
    JOURNAL OF EVOLUTIONARY BIOLOGY, 2014, 27 (01) : 193 - 199
  • [9] Population genomic footprints of selection and associations with climate in natural populations of Arabidopsis halleri from the Alps
    Fischer, Martin C.
    Rellstab, Christian
    Tedder, Andrew
    Zoller, Stefan
    Gugerli, Felix
    Shimizu, Kentaro K.
    Holderegger, Rolf
    Widmer, Alex
    MOLECULAR ECOLOGY, 2013, 22 (22) : 5594 - 5607
  • [10] Effects of genetic background on response to selection in experimental populations of Arabidopsis thaliana
    Ungerer, MC
    Linder, CR
    Rieseberg, LH
    GENETICS, 2003, 163 (01) : 277 - 286