Marker imputation efficiency for genotyping-by-sequencing data in rice (Oryza sativa) and alfalfa (Medicago sativa)

被引:39
作者
Nazzicari, Nelson [1 ]
Biscarini, Filippo [2 ]
Cozzi, Paolo [2 ]
Brummer, E. Charles [3 ]
Annicchiarico, Paolo [2 ]
机构
[1] Res Ctr Fodder Crops & Dairy Prod, Council Agr Res & Econ CREA, Lodi, Italy
[2] Fdn Parco Tecnol Padano, Dipartimento Bioinformat, Lodi, Italy
[3] Univ Calif Davis, Dept Plant Sci, Davis, CA 95616 USA
关键词
SNP; Genotyping by sequencing (GBS); K-nearest neighbors imputation (KNNI); Random Forest imputation (RFI); Singular value decomposition imputation (SVDI); Beagle; FILLIN; Alfalfa; Rice; Imputation; Reference genome; GENOMIC SELECTION; READ ALIGNMENT; LINKAGE MAP; ASSOCIATION; POPULATIONS; ACCURACY;
D O I
10.1007/s11032-016-0490-y
中图分类号
S3 [农学(农艺学)];
学科分类号
0901 ;
摘要
Genotyping-by-sequencing (GBS) is a rapid and cost-effective genome-wide genotyping technique applicable whether a reference genome is available or not. Due to the cost-coverage trade-off, however, GBS typically produces large amounts of missing marker genotypes, whose imputation becomes therefore both challenging and critical for later analyses. In this work, the performance of four general imputation methods (K-nearest neighbors, Random Forest, singular value decomposition, and mean value) and two genotype-specific methods ("Beagle" and FILLIN) was measured on GBS data from alfalfa (Medicago sativa L., autotetraploid, heterozygous, without reference genome) and rice (Oryza sativa L., diploid, 100 % homozygous, with reference genome). Alfalfa SNP were aligned on the genome of the closely related species Medicago truncatula L.. Benchmarks consisted in progressive data filtering for marker call rate (up to 70 %) and increasing proportions (up to 20 %) of known genotypes masked for imputation. The relative performance was measured as the total proportion of correctly imputed genotypes, globally and within each genotype class (two homozygotes in rice, two homozygotes and one heterozygote in alfalfa). We found that imputation accuracy was robust to increasing missing rates, and consistently higher in rice than in alfalfa. Accuracy was as high as 90-100 % for the major (most frequent) homozygous genotype, but dropped to 80-90 %(rice) and below 30 %(alfalfa) in the minor homozygous genotype. Beagle was the best performing method, both accuracy-and time-wise, in rice. In alfalfa, KNNI and RFI gave the highest accuracies, but KNNI was much faster.
引用
收藏
页数:16
相关论文
共 46 条
[11]   A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species [J].
Elshire, Robert J. ;
Glaubitz, Jeffrey C. ;
Sun, Qi ;
Poland, Jesse A. ;
Kawamoto, Ken ;
Buckler, Edward S. ;
Mitchell, Sharon E. .
PLOS ONE, 2011, 6 (05)
[12]   Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP [J].
Endelman, Jeffrey B. .
PLANT GENOME, 2011, 4 (03) :250-255
[13]   TASSEL-GBS: A High Capacity Genotyping by Sequencing Analysis Pipeline [J].
Glaubitz, Jeffrey C. ;
Casstevens, Terry M. ;
Lu, Fei ;
Harriman, James ;
Elshire, Robert J. ;
Sun, Qi ;
Buckler, Edward S. .
PLOS ONE, 2014, 9 (02)
[14]   Invited review: Genomic selection in dairy cattle: Progress and challenges [J].
Hayes, B. J. ;
Bowman, P. J. ;
Chamberlain, A. J. ;
Goddard, M. E. .
JOURNAL OF DAIRY SCIENCE, 2009, 92 (02) :433-443
[15]   Learning from Imbalanced Data [J].
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) :1263-1284
[16]   Factors Affecting the Accuracy of Genotype Imputation in Populations from Several Maize Breeding Programs [J].
Hickey, John M. ;
Crossa, Jose ;
Babu, Raman ;
de los Campos, Gustavo .
CROP SCIENCE, 2012, 52 (02) :654-663
[17]   Efficient Imputation of Missing Markers in Low-Coverage Genotyping-by-Sequencing Data from Multiparental Crosses [J].
Huang, B. Emma ;
Raghavan, Chitra ;
Mauleon, Ramil ;
Broman, Karl W. ;
Leung, Hei .
GENETICS, 2014, 197 (01) :401-404
[18]   Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data [J].
Kawahara, Yoshihiro ;
de la Bastide, Melissa ;
Hamilton, John P. ;
Kanamori, Hiroyuki ;
McCombie, W. Richard ;
Ouyang, Shu ;
Schwartz, David C. ;
Tanaka, Tsuyoshi ;
Wu, Jianzhong ;
Zhou, Shiguo ;
Childs, Kevin L. ;
Davidson, Rebecca M. ;
Lin, Haining ;
Quesada-Ocampo, Lina ;
Vaillancourt, Brieanne ;
Sakai, Hiroaki ;
Lee, Sung Shin ;
Kim, Jungsok ;
Numa, Hisataka ;
Itoh, Takeshi ;
Buell, C. Robin ;
Matsumoto, Takashi .
RICE, 2013, 6 :3-10
[19]  
Kotsiantis S., 2006, GESTS International Transactions on Computer Science and Engineering, V30, P25
[20]  
Langmead B, 2012, NAT METHODS, V9, P357, DOI [10.1038/NMETH.1923, 10.1038/nmeth.1923]