Clustering by genetic ancestry using genome-wide SNP data

被引:36
|
作者
Solovieff, Nadia [1 ]
Hartley, Stephen W. [1 ]
Baldwin, Clinton T. [2 ]
Perls, Thomas T. [3 ]
Steinberg, Martin H.
Sebastiani, Paola [1 ]
机构
[1] Boston Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02118 USA
[2] Boston Univ, Sch Med, Ctr Human Genet, Boston, MA 02118 USA
[3] Boston Univ, Sch Med, Dept Med, Div Geriatr, Boston, MA 02118 USA
来源
BMC GENETICS | 2010年 / 11卷
基金
美国国家卫生研究院;
关键词
POPULATION STRATIFICATION; ASSOCIATION; LOCI; INFERENCE; SAMPLES; SET;
D O I
10.1186/1471-2156-11-108
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: Population stratification can cause spurious associations in a genome-wide association study (GWAS), and occurs when differences in allele frequencies of single nucleotide polymorphisms (SNPs) are due to ancestral differences between cases and controls rather than the trait of interest. Principal components analysis (PCA) is the established approach to detect population substructure using genome-wide data and to adjust the genetic association for stratification by including the top principal components in the analysis. An alternative solution is genetic matching of cases and controls that requires, however, well defined population strata for appropriate selection of cases and controls. Results: We developed a novel algorithm to cluster individuals into groups with similar ancestral backgrounds based on the principal components computed by PCA. We demonstrate the effectiveness of our algorithm in real and simulated data, and show that matching cases and controls using the clusters assigned by the algorithm substantially reduces population stratification bias. Through simulation we show that the power of our method is higher than adjustment for PCs in certain situations. Conclusions: In addition to reducing population stratification bias and improving power, matching creates a clean dataset free of population stratification which can then be used to build prediction models without including variables to adjust for ancestry. The cluster assignments also allow for the estimation of genetic heterogeneity by examining cluster specific effects.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Clustering by genetic ancestry using genome-wide SNP data
    Nadia Solovieff
    Stephen W Hartley
    Clinton T Baldwin
    Thomas T Perls
    Martin H Steinberg
    Paola Sebastiani
    BMC Genetics, 11
  • [2] Species Delimitation using Genome-Wide SNP Data
    Leache, Adam D.
    Fujita, Matthew K.
    Minin, Vladimir N.
    Bouckaert, Remco R.
    SYSTEMATIC BIOLOGY, 2014, 63 (04) : 534 - 542
  • [3] Genome-wide SNP data unravel the ancestry and signatures of divergent selection in Ghurrah pigs of India
    Mehrotra, Arnav
    Bhushan, Bharat
    Karthikeyan, A.
    Singh, Akansha
    Panda, Snehasmita
    Bhati, Meenu
    Panigrahi, Manjit
    Dutt, Triveni
    Mishra, P. Bishnu
    Pausch, Hubert
    Kumar, Amit
    LIVESTOCK SCIENCE, 2021, 250
  • [4] Genetic Diversity and Selection Signatures of Lvliang Black Goat Using Genome-Wide SNP Data
    Cai, Ke
    Wang, Wannian
    Wang, Xu
    Pang, Zhixu
    Zhou, Zhenqi
    Cheng, Lifen
    Qiao, Liying
    Liu, Qiaoxia
    Pan, Yangyang
    Yang, Kaijie
    Liu, Wenzhong
    Liu, Jianhua
    ANIMALS, 2024, 14 (21):
  • [5] Simultaneous analysis of genome-wide SNP data
    Hoggart, C. J.
    De Iorio, M.
    Whittaker, J. C.
    Balding, D. J.
    GENETIC EPIDEMIOLOGY, 2007, 31 (06) : 609 - 609
  • [6] Genetic Diversity in the Modern Horse Illustrated from Genome-Wide SNP Data
    Petersen, Jessica L.
    Mickelson, James R.
    Cothran, E. Gus
    Andersson, Lisa S.
    Axelsson, Jeanette
    Bailey, Ernie
    Bannasch, Danika
    Binns, Matthew M.
    Borges, Alexandre S.
    Brama, Pieter
    Machado, Artur da Camara
    Distl, Ottmar
    Felicetti, Michela
    Fox-Clipsham, Laura
    Graves, Kathryn T.
    Guerin, Gerard
    Haase, Bianca
    Hasegawa, Telhisa
    Hemmann, Karin
    Hill, Emmeline W.
    Leeb, Tosso
    Lindgren, Gabriella
    Lohi, Hannes
    Lopes, Maria Susana
    McGivney, Beatrice A.
    Mikko, Sofia
    Orr, Nicholas
    Penedo, M. Cecilia T.
    Piercy, Richard J.
    Raekallio, Marja
    Rieder, Stefan
    Roed, Knut H.
    Silvestrelli, Maurizio
    Swinburne, June
    Tozaki, Teruaki
    Vaudin, Mark
    Wade, Claire M.
    McCue, Molly E.
    PLOS ONE, 2013, 8 (01):
  • [7] Genome-wide methylation data mirror ancestry information
    Elior Rahmani
    Liat Shenhav
    Regev Schweiger
    Paul Yousefi
    Karen Huen
    Brenda Eskenazi
    Celeste Eng
    Scott Huntsman
    Donglei Hu
    Joshua Galanter
    Sam S. Oh
    Melanie Waldenberger
    Konstantin Strauch
    Harald Grallert
    Thomas Meitinger
    Christian Gieger
    Nina Holland
    Esteban G. Burchard
    Noah Zaitlen
    Eran Halperin
    Epigenetics & Chromatin, 10
  • [8] Genome-wide methylation data mirror ancestry information
    Rahmani, Elior
    Shenhav, Liat
    Schweiger, Regev
    Yousefi, Paul
    Huen, Karen
    Eskenazi, Brenda
    Eng, Celeste
    Huntsman, Scott
    Hu, Donglei
    Galanter, Joshua
    Oh, Sam S.
    Waldenberger, Melanie
    Strauch, Konstantin
    Grallert, Harald
    Meitinger, Thomas
    Gieger, Christian
    Holland, Nina
    Burchard, Esteban G.
    Zaitlen, Noah
    Halperin, Eran
    EPIGENETICS & CHROMATIN, 2017, 10
  • [9] Detection of selective sweeps in cattle using genome-wide SNP data
    Holly R Ramey
    Jared E Decker
    Stephanie D McKay
    Megan M Rolf
    Robert D Schnabel
    Jeremy F Taylor
    BMC Genomics, 14
  • [10] The patterns of admixture, divergence, and ancestry of African cattle populations determined from genome-wide SNP data
    Gebrehiwot, N. Z.
    Strucken, E. M.
    Aliloo, H.
    Marshall, K.
    Gibson, J. P.
    BMC GENOMICS, 2020, 21 (01)