Robust Inference of Population Structure for Ancestry Prediction and Correction of Stratification in the Presence of Relatedness

被引:271
作者
Conomos, Matthew P. [1 ]
Miller, Michael B. [2 ]
Thornton, Timothy A. [1 ]
机构
[1] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
[2] Univ Minnesota, Dept Psychol, Minneapolis, MN 55454 USA
基金
美国国家卫生研究院;
关键词
PCA; admixture; cryptic relatedness; pedigrees; GWAS; ASSOCIATION ANALYSIS; GENOME; MODEL; IDENTITY; KINSHIP; TOOL; SET;
D O I
10.1002/gepi.21896
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multidimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using 10 (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness.
引用
收藏
页码:276 / 293
页数:18
相关论文
共 33 条
[1]   A graphical algorithm for fast computation of identity coefficients and generalized kinship coefficients [J].
Abney, Mark .
BIOINFORMATICS, 2009, 25 (12) :1561-1563
[2]   Fast model-based estimation of ancestry in unrelated individuals [J].
Alexander, David H. ;
Novembre, John ;
Lange, Kenneth .
GENOME RESEARCH, 2009, 19 (09) :1655-1664
[3]   Integrating common and rare genetic variation in diverse human populations [J].
Altshuler, David M. ;
Gibbs, Richard A. ;
Peltonen, Leena ;
Dermitzakis, Emmanouil ;
Schaffner, Stephen F. ;
Yu, Fuli ;
Bonnen, Penelope E. ;
de Bakker, Paul I. W. ;
Deloukas, Panos ;
Gabriel, Stacey B. ;
Gwilliam, Rhian ;
Hunt, Sarah ;
Inouye, Michael ;
Jia, Xiaoming ;
Palotie, Aarno ;
Parkin, Melissa ;
Whittaker, Pamela ;
Chang, Kyle ;
Hawes, Alicia ;
Lewis, Lora R. ;
Ren, Yanru ;
Wheeler, David ;
Muzny, Donna Marie ;
Barnes, Chris ;
Darvishi, Katayoon ;
Hurles, Matthew ;
Korn, Joshua M. ;
Kristiansson, Kati ;
Lee, Charles ;
McCarroll, Steven A. ;
Nemesh, James ;
Keinan, Alon ;
Montgomery, Stephen B. ;
Pollack, Samuela ;
Price, Alkes L. ;
Soranzo, Nicole ;
Gonzaga-Jauregui, Claudia ;
Anttila, Verneri ;
Brodeur, Wendy ;
Daly, Mark J. ;
Leslie, Stephen ;
McVean, Gil ;
Moutsianas, Loukas ;
Nguyen, Huy ;
Zhang, Qingrun ;
Ghori, Mohammed J. R. ;
McGinnis, Ralph ;
McLaren, William ;
Takeuchi, Fumihiko ;
Grossman, Sharon R. .
NATURE, 2010, 467 (7311) :52-58
[4]   A METHOD FOR QUANTIFYING DIFFERENTIATION BETWEEN POPULATIONS AT MULTI-ALLELIC LOCI AND ITS IMPLICATIONS FOR INVESTIGATING IDENTITY AND PATERNITY [J].
BALDING, DJ ;
NICHOLS, RA .
GENETICA, 1995, 96 (1-2) :3-12
[5]   Improved ancestry inference using weights from external reference panels [J].
Chen, Chia-Yen ;
Pollack, Samuela ;
Hunter, David J. ;
Hirschhorn, Joel N. ;
Kraft, Peter ;
Price, Alkes L. .
BIOINFORMATICS, 2013, 29 (11) :1399-1406
[6]   Genomic control for association studies [J].
Devlin, B ;
Roeder, K .
BIOMETRICS, 1999, 55 (04) :997-1004
[7]   Reconstructing Native American Migrations from Whole-Genome and Whole-Exome Data [J].
Gravel, Simon ;
Zakharia, Fouad ;
Moreno-Estrada, Andres ;
Byrnes, Jake K. ;
Muzzio, Marina ;
Rodriguez-Flores, Juan L. ;
Kenny, Eimear E. ;
Gignoux, Christopher R. ;
Maples, Brian K. ;
Guiblet, Wilfried ;
Dutil, Julie ;
Via, Marc ;
Sandoval, Karla ;
Bedoya, Gabriel ;
Oleksyk, Taras K. ;
Ruiz-Linares, Andres ;
Burchard, Esteban G. ;
Martinez-Cruzado, Juan Carlos ;
Bustamante, Carlos D. .
PLOS GENETICS, 2013, 9 (12)
[8]   Investigation of the fine structure of European populations with applications to disease association studies [J].
Heath, Simon C. ;
Gut, Ivo G. ;
Brennan, Paul ;
McKay, James D. ;
Bencko, Vladimir ;
Fabianova, Eleonora ;
Foretova, Lenka ;
Georges, Michel ;
Janout, Vladimir ;
Kabesch, Michael ;
Krokan, Hans E. ;
Elvestad, Maiken B. ;
Lissowska, Jolanta ;
Mates, Dana ;
Rudnai, Peter ;
Skorpen, Frank ;
Schreiber, Stefan ;
Soria, Jose M. ;
Syvanen, Ann-Christine ;
Meneton, Pierre ;
Hercberg, Serge ;
Galan, Pilar ;
Szeszenia-Dabrowska, Neonilia ;
Zaridze, David ;
Genin, Emmanuel ;
Cardon, Lon R. ;
Lathrop, Mark .
EUROPEAN JOURNAL OF HUMAN GENETICS, 2008, 16 (12) :1413-1429
[9]  
Jackson J.E., 1991, A user's guide to principal components
[10]  
Jolliffe I., 2002, PRINCIPAL COMPONENT, DOI [10.1007/978-1-4757-1904-8_7, 10.1016/0169-7439(87)80084-9]