DAIRRy-BLUP: A High-Performance Computing Approach to Genomic Prediction

被引:6
作者
De Coninck, Arne [1 ]
Fostier, Jan [2 ,3 ]
Maenhout, Steven [4 ]
De Baets, Bernard [1 ]
机构
[1] Univ Ghent, Res Unit Knowledge Based Syst KERMIT, Dept Math Modelling Stat & Bioinformat, B-9000 Ghent, Belgium
[2] Ghent Univ IMinds, IBCN, B-9000 Ghent, Belgium
[3] Ghent Univ IMinds, Serv Res Unit, Dept Informat Technol, B-9000 Ghent, Belgium
[4] Progeno, B-9052 Zwijnaarde, Belgium
关键词
RIDGE-REGRESSION; SELECTION; INFORMATION; GENETICS; SIMULATION; ALGORITHM;
D O I
10.1534/genetics.114.163683
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
In genomic prediction, common analysis methods rely on a linear mixed-model framework to estimate SNP marker effects and breeding values of animals or plants. Ridge regression-best linear unbiased prediction (RR-BLUP) is based on the assumptions that SNP marker effects are normally distributed, are uncorrelated, and have equal variances. We propose DAIRRy-BLUP, a parallel, Distributed-memory RR-BLUP implementation, based on single-trait observations (y), that uses the Average Information algorithm for restricted maximum-likelihood estimation of the variance components. The goal of DAIRRy-BLUP is to enable the analysis of large-scale data sets to provide more accurate estimates of marker effects and breeding values. A distributed-memory framework is required since the dimensionality of the problem, determined by the number of SNP markers, can become too large to be analyzed by a single computing node. Initial results show that DAIRRy-BLUP enables the analysis of very large-scale data sets (up to 1,000,000 individuals and 360,000 SNPs) and indicate that increasing the number of phenotypic and genotypic records has a more significant effect on the prediction accuracy than increasing the density of SNP arrays.
引用
收藏
页码:813 / +
页数:12
相关论文
共 32 条
[1]  
Blackford L., 1997, ScaLAPACK Users Guide
[2]   Fast and flexible simulation of DNA sequence data [J].
Chen, Gary K. ;
Marjoram, Paul ;
Wall, Jeffrey D. .
GENOME RESEARCH, 2009, 19 (01) :136-142
[3]  
Choi Jaeyoung., 1996, SCI PROGRAMMING-NETH, V5, P173
[4]  
Choi Jaeyoung, 1996, APPL PARALLEL COMPUT, P107
[5]   BREEDING AND GENETICS SYMPOSIUM: Really big data: Processing and analysis of very large data sets [J].
Cole, J. B. ;
Newman, S. ;
Foertter, F. ;
Aguilar, I. ;
Coffey, M. .
JOURNAL OF ANIMAL SCIENCE, 2012, 90 (03) :723-733
[6]   Genomic Prediction in Animals and Plants: Simulation of Data, Validation, Reporting, and Benchmarking [J].
Daetwyler, Hans D. ;
Calus, Mario P. L. ;
Pong-Wong, Ricardo ;
de los Campos, Gustavo ;
Hickey, John M. .
GENETICS, 2013, 193 (02) :347-+
[7]   Setting the Standard: A Special Focus on Genomic Selection in GENETICS and G3 [J].
de Koning, Dirk-Jan ;
McIntyre, Lauren .
GENETICS, 2012, 190 (04) :1151-1152
[8]   Comparison on genomic predictions using three GBLUP methods and two single-step blending methods in the Nordic Holstein population [J].
Gao, Hongding ;
Christensen, Ole F. ;
Madsen, Per ;
Nielsen, Ulrik S. ;
Zhang, Yuan ;
Lund, Mogens S. ;
Su, Guosheng .
GENETICS SELECTION EVOLUTION, 2012, 44
[9]   Average information REML: An efficient algorithm for variance parameter estimation in linear mixed models [J].
Gilmour, AR ;
Thompson, R ;
Cullis, BR .
BIOMETRICS, 1995, 51 (04) :1440-1450
[10]   Genomic BLUP Decoded: A Look into the Black Box of Genomic Prediction [J].
Habier, David ;
Fernando, Rohan L. ;
Garrick, Dorian J. .
GENETICS, 2013, 194 (03) :597-+