A transformer-based genomic prediction method fused with knowledge-guided module

被引:9
作者
Wu, Cuiling [1 ]
Zhang, Yiyi [1 ]
Ying, Zhiwen [1 ]
Li, Ling [1 ]
Wang, Jun [1 ]
Yu, Hui [1 ]
Zhang, Mengchen [1 ]
Feng, Xianzhong [1 ]
Wei, Xinghua [1 ]
Xu, Xiaogang [1 ]
机构
[1] Zhejiang Gongshang Univ, Sch Comp & Informat Engn, Hangzhou 310018, Peoples R China
关键词
deep learning; Transformer; genomic prediction; knowledge-guided module; prediction method; REGRESSION; SELECTION; TRAITS;
D O I
10.1093/bib/bbad438
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Genomic prediction (GP) uses single nucleotide polymorphisms (SNPs) to establish associations between markers and phenotypes. Selection of early individuals by genomic estimated breeding value shortens the generation interval and speeds up the breeding process. Recently, methods based on deep learning (DL) have gained great attention in the field of GP. In this study, we explore the application of Transformer-based structures to GP and develop a novel deep-learning model named GPformer. GPformer obtains a global view by gleaning beneficial information from all relevant SNPs regardless of the physical distance between SNPs. Comprehensive experimental results on five different crop datasets show that GPformer outperforms ridge regression-based linear unbiased prediction (RR-BLUP), support vector regression (SVR), light gradient boosting machine (LightGBM) and deep neural network genomic prediction (DNNGP) in terms of mean absolute error, Pearson's correlation coefficient and the proposed metric consistent index. Furthermore, we introduce a knowledge-guided module (KGM) to extract genome-wide association studies-based information, which is fused into GPformer as prior knowledge. KGM is very flexible and can be plugged into any DL network. Ablation studies of KGM on three datasets illustrate the efficiency of KGM adequately. Moreover, GPformer is robust and stable to hyperparameters and can generalize to each phenotype of every dataset, which is suitable for practical application scenarios.
引用
收藏
页数:11
相关论文
共 39 条
[1]   Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes [J].
Abdollahi-Arpanahi, Rostam L. ;
Gianola, Daniel ;
Penagaricano, Francisco .
GENETICS SELECTION EVOLUTION, 2020, 52 (01)
[2]  
[Anonymous], P AAAI C ART INT
[3]  
[Anonymous], 8th International Conference on Learning Representations
[4]  
[Anonymous], 3rd International Conference on Learning Representations
[5]  
[Anonymous], 2017, IEEE INT C COMP VIS, P2999
[6]   Effective gene expression prediction from sequence by integrating long-range interactions [J].
Avsec, Ziga ;
Agarwal, Vikram ;
Visentin, Daniel ;
Ledsam, Joseph R. ;
Grabska-Barwinska, Agnieszka ;
Taylor, Kyle R. ;
Assael, Yannis ;
Jumper, John ;
Kohli, Pushmeet ;
Kelley, David R. .
NATURE METHODS, 2021, 18 (10) :1196-+
[7]   Genomic Selection in the Era of Next Generation Sequencing for Complex Traits in Plant Breeding [J].
Bhat, Javaid A. ;
Ali, Sajad ;
Salgotra, Romesh K. ;
Mir, Zahoor A. ;
Dutta, Sutapa ;
Jadon, Vasudha ;
Tyagi, Anshika ;
Mushtaq, Muntazir ;
Jain, Neelu ;
Singh, Pradeep K. ;
Singh, Gyanendra P. ;
Prabhu, K. V. .
FRONTIERS IN GENETICS, 2016, 7
[8]   TASSEL: software for association mapping of complex traits in diverse samples [J].
Bradbury, Peter J. ;
Zhang, Zhiwu ;
Kroon, Dallas E. ;
Casstevens, Terry M. ;
Ramdoss, Yogesh ;
Buckler, Edward S. .
BIOINFORMATICS, 2007, 23 (19) :2633-2635
[9]  
Brown T., 2020, P 34 C ADV NEUR INF, P1877, DOI DOI 10.48550/ARXIV.2005.14165
[10]   Panzea: an update on new content and features [J].
Canaran, Payan ;
Buckler, Edward S. ;
Glaubitz, Jeffrey C. ;
Stein, Lincoln ;
Sun, Qi ;
Zhao, Wei ;
Ware, Doreen .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D1041-D1043