An empirical evaluation of genotype imputation of ancient DNA

被引:12
作者
Ausmees, Kristiina [1 ]
Sanchez-Quinto, Federico [2 ,3 ]
Jakobsson, Mattias [3 ]
Nettelblad, Carl [1 ]
机构
[1] Uppsala Univ, Dept Informat Technol, Box 337, S-75105 Uppsala, Sweden
[2] Inst Nacl Med Genom INMEGEN, Mexico City 14610, DF, Mexico
[3] Uppsala Univ, Dept Organismal Biol, Human Evolut, S-75236 Uppsala, Sweden
关键词
imputation; phasing; ancient DNA; low coverage; reference bias; MISSING-DATA; LINKAGE DISEQUILIBRIUM; INFERENCE; ACCURACY; PATTERNS; DAMAGE; ASSOCIATION; SEQUENCES; THOUSANDS; ADMIXTURE;
D O I
10.1093/g3journal/jkac089
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
With capabilities of sequencing ancient DNA to high coverage often limited by sample quality or cost, imputation of missing genotypes presents a possibility to increase the power of inference as well as cost-effectiveness for the analysis of ancient data. However, the high degree of uncertainty often associated with ancient DNA poses several methodological challenges, and performance of imputation methods in this context has not been fully explored. To gain further insights, we performed a systematic evaluation of imputation of ancient data using Beagle v4.0 and reference data from phase 3 of the 1000 Genomes project, investigating the effects of coverage, phased reference, and study sample size. Making use of five ancient individuals with high-coverage data available, we evaluated imputed data for accuracy, reference bias, and genetic affinities as captured by principal component analysis. We obtained genotype concordance levels of over 99% for data with 1x coverage, and similar levels of accuracy and reference bias at levels as low as 0.75x. Our findings suggest that using imputed data can be a realistic option for various population genetic analyses even for data in coverage ranges below 1x. We also show that a large and varied phased reference panel as well as the inclusion of low- to moderate-coverage ancient individuals in the study sample can increase imputation performance, particularly for rare alleles. In-depth analysis of imputed data with respect to genetic variants and allele frequencies gave further insight into the nature of errors arising during imputation, and can provide practical guidelines for postprocessing and validation prior to downstream analysis.
引用
收藏
页数:8
相关论文
共 48 条
[1]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[2]   Ancient Rome: Agenetic crossroads of Europe and the Mediterranean [J].
Antonio, Margaret L. ;
Gao, Ziyue ;
Moots, Hannah M. ;
Lucci, Michaela ;
Candilio, Francesca ;
Sawyer, Susanna ;
Oberreiter, Victoria ;
Calderon, Diego ;
Devitofranceschi, Katharina ;
Aikens, Rachael C. ;
Aneli, Serena ;
Bartoli, Fulvio ;
Bedini, Alessandro ;
Cheronet, Olivia ;
Cotter, Daniel J. ;
Fernandes, Daniel M. ;
Gasperetti, Gabriella ;
Grifoni, Renata ;
Guidi, Alessandro ;
La Pastina, Francesco ;
Loreti, Ersilia ;
Manacorda, Daniele ;
Matullo, Giuseppe ;
Morretta, Simona ;
Nava, Alessia ;
Nicolai, Vincenzo Fiocchi ;
Nomi, Federico ;
Pavolini, Carlo ;
Pentiricci, Massimo ;
Pergola, Philippe ;
Piranomonte, Marina ;
Schmidt, Ryan ;
Spinola, Giandomenico ;
Sperduti, Alessandra ;
Rubini, Mauro ;
Bondioli, Luca ;
Coppa, Alfredo ;
Pinhasi, Ron ;
Pritchard, Jonathan K. .
SCIENCE, 2019, 366 (6466) :708-+
[3]   Dealing with missing data in MSPC: several methods, different interpretations, some examples [J].
Arteaga, F ;
Ferrer, A .
JOURNAL OF CHEMOMETRICS, 2002, 16 (8-10) :408-418
[4]   Patterns of damage in genomic DNA sequences from a Neandertal [J].
Briggs, Adrian W. ;
Stenzel, Udo ;
Johnson, Philip L. F. ;
Green, Richard E. ;
Kelso, Janet ;
Pruefer, Kay ;
Meyer, Matthias ;
Krause, Johannes ;
Ronan, Michael T. ;
Lachmann, Michael ;
Paeaebo, Svante .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (37) :14616-14621
[5]  
Broad Institute, 2019, VERS 2 0 1 PIC TOOLS
[6]   Novel high-resolution characterization of ancient DNA reveals C>U-type base modification events as the sole cause of post mortem miscoding lesions [J].
Brotherton, Paul ;
Endicott, Phillip ;
Sanchez, Juan J. ;
Beaumont, Mark ;
Barnett, Ross ;
Austin, Jeremy ;
Cooper, Alan .
NUCLEIC ACIDS RESEARCH, 2007, 35 (17) :5717-5728
[7]   Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies [J].
Browning, Brian L. ;
Yu, Zhaoxia .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 85 (06) :847-861
[8]   Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering [J].
Browning, Sharon R. ;
Browning, Brian L. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) :1084-1097
[9]   Haplotype phasing: existing methods and new developments [J].
Browning, Sharon R. ;
Browning, Brian L. .
NATURE REVIEWS GENETICS, 2011, 12 (10) :703-714
[10]   Missing data imputation and haplotype phase inference for genome-wide association studies [J].
Browning, Sharon R. .
HUMAN GENETICS, 2008, 124 (05) :439-450