HGATGS: Hypergraph Attention Network for Crop Genomic Selection

被引:0
作者
He, Xuliang [1 ,2 ]
Wang, Kaiyi [2 ]
Zhang, Liyang [3 ]
Zhang, Dongfeng [2 ]
Yang, Feng [2 ]
Zhang, Qiusi [2 ]
Pan, Shouhui [2 ,4 ]
Li, Jinlong [5 ]
Bai, Longpeng [2 ,6 ]
Sun, Jiahao [2 ,7 ]
Liu, Zhongqiang [2 ]
机构
[1] Shenyang Agr Univ, Coll Informat & Elect Engn, Shenyang 110866, Peoples R China
[2] Natl Innovat Ctr Digital Seed Ind, Beijing 100097, Peoples R China
[3] SDIC Seed Technol Co Ltd, Beijing 100034, Peoples R China
[4] Beijing PAIDE Sci & Technol Dev Co Ltd, Beijing 100097, Peoples R China
[5] Beijing Forestry Univ, Coll Biol Sci & Technol, State Key Lab Tree Genet & Breeding, Beijing 100083, Peoples R China
[6] Shanghai Ocean Univ, Minist Agr & Rural Affairs, Key Lab Fisheries Informat, Shanghai 201306, Peoples R China
[7] Northwest A&F Univ, Coll Agron, Yangling 712100, Peoples R China
来源
AGRICULTURE-BASEL | 2025年 / 15卷 / 04期
关键词
hypergraph convolutional neural network; genomic selection; deep learning; crop phenotypic prediction; LINEAR UNBIASED PREDICTION; CLASSIFICATION; INFORMATION;
D O I
10.3390/agriculture15040409
中图分类号
S3 [农学(农艺学)];
学科分类号
0901 ;
摘要
Many important plants' agronomic traits, such as crop yield, stress tolerance, and other traits, are controlled by multiple genes and exhibit complex inheritance patterns. Traditional breeding methods often encounter difficulties in dealing with these traits due to their complexity. However, genomic selection (GS), which utilizes high-density molecular markers across the entire genome to facilitate selection in breeding programs, excels in capturing the genetic variation associated with these traits. This enables more accurate and efficient selection in breeding. The traditional crop genome selection model, based on statistical methods or machine learning models, often treats samples as independent entities while neglecting the abundance latent relational information among them. Consequently, this limitation hampers their predictive performance. In this study, we proposed a novel crop genome selection model based on hypergraph attention networks for genomic prediction (HGATGS). This model incorporates dynamic hyperedges that are designed based on sample similarity to validate the efficacy of high-order relationships between samples for phenotypic prediction. By introducing an attention mechanism, it assigns weights to different hyperedges and nodes, thereby enhancing the ability to capture kinship relationships among samples. Additionally, residual connections are incorporated between hypergraph convolutional layers to further improve model stability and performance. The model was validated on datasets for multiple crops, including wheat, corn, and rice. The results showed that HGATGS significantly outperformed traditional statistical methods and machine learning models on the Wheat 599, Rice 299, and G2F 2017 datasets. On Wheat 599, HGATGS achieved a correlation coefficient of 0.54, a 14.9% improvement over methods like R-BLUP and BayesA (0.47). On Rice 299, HGATGS reached 0.45, a 66.7% increase compared to other models like R-BLUP and SVR (0.27). On G2F 2017, HGATGS attained 0.88, slightly surpassing other models like R-BLUP and BayesA (0.87). We conducted ablation experiments to compare the model's performance across three datasets, and found that the model integrating hypergraph attention and residual connections performed optimally. Subsequent comparisons of the model's prediction performance with dynamically selected different k values revealed optimal performance when K = (3,4). The model's prediction performance was also compared across different single nucleotide polymorphisms (SNPs) and sample sizes in various datasets, with HGATGS consistently outperforming the comparison models. Finally, visualizations of the constructed hypergraph structures showed that certain nodes have high connection densities with hyperedges. These nodes often represent varieties or genotypes with significant impacts on traits. During feature aggregation, these high-connectivity nodes contribute significantly to the prediction results and demonstrate better prediction performance across multiple traits in multiple crops. This demonstrates that the method of constructing hypergraphs through correlation relationships for prediction is highly effective.
引用
收藏
页数:23
相关论文
共 47 条
  • [1] The State of Food Security and Nutrition in the World 2021: Transforming Food Systems for Food Security, Improved Nutrition and Affordable Healthy Diets for All, (2021)
  • [2] Moose S.P., Mumm R.H., Molecular Plant Breeding as the Foundation for 21st Century Crop Improvement, Plant Physiol, 147, pp. 969-977, (2008)
  • [3] Shiferaw B., Prasanna B.M., Hellin J., Banziger M., Crops that feed the world 6. Past successes and future challenges to the role played by maize in global food security, Food Secur, 3, pp. 307-327, (2011)
  • [4] Xu Y., Crouch J.H., Marker-Assisted Selection in Plant Breeding: From Publications to Practice, Crop Sci, 48, pp. 391-407, (2008)
  • [5] Meuwissen T.H.E., Hayes B.J., Goddard M.E., Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps, Genetics, 157, pp. 1819-1829, (2001)
  • [6] Henderson C.R., Best Linear Unbiased Estimation and Prediction under a Selection Model, Biometrics, 31, (1975)
  • [7] VanRaden P.M., Efficient Methods to Compute Genomic Predictions, J. Dairy Sci, 91, pp. 4414-4423, (2008)
  • [8] Habier D., Fernando R.L., Dekkers J.C.M., The Impact of Genetic Relationship Information on Genome-Assisted Breeding Values, Genetics, 177, pp. 2389-2397, (2007)
  • [9] Tenesa A., Visscher P.M., Carothers A.D., Knott S.A., Mapping Quantitative Trait Loci Using Linkage Disequilibrium: Marker- versus Trait-based Methods, Behav. Genet, 35, pp. 219-228, (2005)
  • [10] Henderson C.R., When handling high-dimensional data, Bayesian methods are prone to overfitting, J. Anim. Sci, 60, pp. 111-117, (1985)