Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation

被引:104
作者
Wang, Chaolong [1 ]
Zhan, Xiaowei [2 ]
Liang, Liming [3 ,4 ]
Abecasis, Goncalo R. [5 ,6 ]
Lin, Xihong [3 ]
机构
[1] Genome Inst Singapore, Dept Computat & Syst Biol, Singapore 138672, Singapore
[2] UT Southwestern Med Ctr, Quantitat Biomed Res Ctr, Dept Clin Sci, Ctr Genet Host Def, Dallas, TX 75235 USA
[3] Harvard Univ, TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[4] Harvard Univ, TH Chan Sch Publ Hlth, Dept Epidemiol, Boston, MA 02115 USA
[5] Univ Michigan, Sch Publ Hlth, Dept Biostat, Ann Arbor, MI 48109 USA
[6] Univ Michigan, Sch Publ Hlth, Ctr Stat Genet, Ann Arbor, MI 48109 USA
关键词
GENOME-WIDE ASSOCIATION; RARE VARIANTS; POPULATION STRATIFICATION; LOCI; SUSCEPTIBILITY;
D O I
10.1016/j.ajhg.2015.04.018
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05x. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources.
引用
收藏
页码:926 / 937
页数:12
相关论文
共 39 条
[1]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[2]  
[Anonymous], 2004, Procrustes Problems
[3]   Demonstrating stratification in a European American population [J].
Campbell, CD ;
Ogburn, EL ;
Lunetta, KL ;
Lyon, HN ;
Freedman, ML ;
Groop, LC ;
Altshuler, D ;
Ardlie, KG ;
Hirschhorn, JN .
NATURE GENETICS, 2005, 37 (08) :868-872
[4]   Population stratification and spurious allelic association [J].
Cardon, LR ;
Palmer, LJ .
LANCET, 2003, 361 (9357) :598-604
[5]   Improved ancestry inference using weights from external reference panels [J].
Chen, Chia-Yen ;
Pollack, Samuela ;
Hunter, David J. ;
Hirschhorn, Joel N. ;
Kraft, Peter ;
Price, Alkes L. .
BIOINFORMATICS, 2013, 29 (11) :1399-1406
[6]   Genetic variants near TIMP3 and high-density lipoprotein-associated loci influence susceptibility to age-related macular degeneration [J].
Chen, Wei ;
Stambolian, Dwight ;
Edwards, Albert O. ;
Branham, Kari E. ;
Othman, Mohammad ;
Jakobsdottir, Johanna ;
Tosakulwong, Nirubol ;
Pericak-Vance, Margaret A. ;
Campochiaro, Peter A. ;
Klein, Michael L. ;
Tan, Perciliz L. ;
Conley, Yvette P. ;
Kanda, Atsuhiro ;
Kopplin, Laura ;
Li, Yanming ;
Augustaitis, Katherine J. ;
Karoukis, Athanasios J. ;
Scott, William K. ;
Agarwal, Anita ;
Kovach, Jaclyn L. ;
Schwartz, Stephen G. ;
Postel, Eric A. ;
Brooks, Matthew ;
Baratz, Keith H. ;
Brown, William L. ;
Brucker, Alexander J. ;
Orlin, Anton ;
Brown, Gary ;
Ho, Allen ;
Regillo, Carl ;
Donoso, Larry ;
Tian, Lifeng ;
Kaderli, Brian ;
Hadley, Dexter ;
Hagstrom, Stephanie A. ;
Peachey, Neal S. ;
Klein, Ronald ;
Klein, Barbara E. K. ;
Gotoh, Norimoto ;
Yamashiro, Kenji ;
Ferris, Frederick, III ;
Fagerness, Jesen A. ;
Reynolds, Robyn ;
Farrer, Lindsay A. ;
Kim, Ivana K. ;
Miller, Joan W. ;
Corton, Marta ;
Carracedo, Angel ;
Sanchez-Salorio, Manuel ;
Pugh, Elizabeth W. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (16) :7401-7406
[7]   Population structure, differential bias and genomic control in a large-scale, case-control association study [J].
Clayton, DG ;
Walker, NM ;
Smyth, DJ ;
Pask, R ;
Cooper, JD ;
Maier, LM ;
Smink, LJ ;
Lam, AC ;
Ovington, NR ;
Stevens, HE ;
Nutland, S ;
Howson, JMM ;
Faham, M ;
Moorhead, M ;
Jones, HB ;
Falkowski, M ;
Hardenbol, P ;
Willis, TD ;
Todd, JA .
NATURE GENETICS, 2005, 37 (11) :1243-1246
[8]   Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis [J].
Engelhardt, Barbara E. ;
Stephens, Matthew .
PLOS GENETICS, 2010, 6 (09)
[9]   Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants [J].
Fu, Wenqing ;
O'Connor, Timothy D. ;
Jun, Goo ;
Kang, Hyun Min ;
Abecasis, Goncalo ;
Leal, Suzanne M. ;
Gabriel, Stacey ;
Altshuler, David ;
Shendure, Jay ;
Nickerson, Deborah A. ;
Bamshad, Michael J. ;
Akey, Joshua M. .
NATURE, 2013, 493 (7431) :216-220
[10]   Quantifying Population Genetic Differentiation from Next-Generation Sequencing Data [J].
Fumagalli, Matteo ;
Vieira, Filipe G. ;
Korneliussen, Thorfinn Sand ;
Linderoth, Tyler ;
Huerta-Sanchez, Emilia ;
Albrechtsen, Anders ;
Nielsen, Rasmus .
GENETICS, 2013, 195 (03) :979-+