Structure-informed clustering for population stratification in association studies

被引:2
作者
Bose, Aritra [1 ]
Burch, Myson [1 ,2 ]
Chowdhury, Agniva [3 ]
Paschou, Peristera [4 ]
Drineas, Petros [2 ]
机构
[1] IBM TJ Watson Res Ctr, Computat Genom, Yorktown Hts, NY USA
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[3] Oak Ridge Natl Lab, Comp Sci & Math Div, Oak Ridge, TN USA
[4] Purdue Univ, Dept Biol Sci, W Lafayette, IN USA
关键词
Association studies; Populations structure; Clustering; LINKAGE-DISEQUILIBRIUM; HERITABILITY; SELECTION;
D O I
10.1186/s12859-023-05511-w
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundIdentifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing methods of population structure correction use principal component analysis or linear mixed models with a random effect when modeling associations between a trait of interest and genetic markers. However, due to stringent significance thresholds and latent interactions between the markers, these methods often fail to detect genuinely associated variants.ResultsTo overcome this, we propose CluStrat, which corrects for complex arbitrarily structured populations while leveraging the linkage disequilibrium induced distances between genetic markers. It performs an agglomerative hierarchical clustering using the Mahalanobis distance covariance matrix of the markers. In simulation studies, we show that our method outperforms existing methods in detecting true causal variants. Applying CluStrat on WTCCC2 and UK Biobank cohorts, we found biologically relevant associations in Schizophrenia and Myocardial Infarction. CluStrat was also able to correct for population structure in polygenic adaptation of height in Europeans.ConclusionsCluStrat highlights the advantages of biologically relevant distance metrics, such as the Mahalanobis distance, which captures the cryptic interactions within populations in the presence of LD better than the Euclidean distance.
引用
收藏
页数:13
相关论文
共 41 条
[1]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[2]   Population Structure and Cryptic Relatedness in Genetic Association Studies [J].
Astle, William ;
Balding, David J. .
STATISTICAL SCIENCE, 2009, 24 (04) :451-471
[3]   A METHOD FOR QUANTIFYING DIFFERENTIATION BETWEEN POPULATIONS AT MULTI-ALLELIC LOCI AND ITS IMPLICATIONS FOR INVESTIGATING IDENTITY AND PATERNITY [J].
BALDING, DJ ;
NICHOLS, RA .
GENETICA, 1995, 96 (1-2) :3-12
[4]  
Belzeaux R., 2006, Neurofibromatosis type 1: Psychiatric disorders and quality of life impairment
[5]   TeraPCA: a fast and scalable software package to study genetic variation in tera-scale genotypes [J].
Bose, Aritra ;
Kalantzis, Vassilis ;
Kontopoulou, Eugenia-Maria ;
Elkady, Mai ;
Paschou, Peristera ;
Drineas, Petros .
BIOINFORMATICS, 2019, 35 (19) :3679-3683
[6]   The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 [J].
Buniello, Annalisa ;
MacArthur, Jacqueline A. L. ;
Cerezo, Maria ;
Harris, Laura W. ;
Hayhurst, James ;
Malangone, Cinzia ;
McMahon, Aoife ;
Morales, Joannella ;
Mountjoy, Edward ;
Sollis, Elliot ;
Suveges, Daniel ;
Vrousgou, Olga ;
Whetzel, Patricia L. ;
Amode, Ridwan ;
Guillen, Jose A. ;
Riat, Harpreet S. ;
Trevanion, Stephen J. ;
Hall, Peggy ;
Junkins, Heather ;
Flicek, Paul ;
Burdett, Tony ;
Hindorff, Lucia A. ;
Cunningham, Fiona ;
Parkinson, Helen .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D1005-D1012
[7]   The UK Biobank resource with deep phenotyping and genomic data [J].
Bycroft, Clare ;
Freeman, Colin ;
Petkova, Desislava ;
Band, Gavin ;
Elliott, Lloyd T. ;
Sharp, Kevin ;
Motyer, Allan ;
Vukcevic, Damjan ;
Delaneau, Olivier ;
O'Connell, Jared ;
Cortes, Adrian ;
Welsh, Samantha ;
Young, Alan ;
Effingham, Mark ;
McVean, Gil ;
Leslie, Stephen ;
Allen, Naomi ;
Donnelly, Peter ;
Marchini, Jonathan .
NATURE, 2018, 562 (7726) :203-+
[8]   Second-generation PLINK: rising to the challenge of larger and richer datasets [J].
Chang, Christopher C. ;
Chow, Carson C. ;
Tellier, Laurent C. A. M. ;
Vattikuti, Shashaank ;
Purcell, Shaun M. ;
Lee, James J. .
GIGASCIENCE, 2015, 4
[9]   The Role of Geography in Human Adaptation [J].
Coop, Graham ;
Pickrell, Joseph K. ;
Novembre, John ;
Kudaravalli, Sridhar ;
Li, Jun ;
Absher, Devin ;
Myers, Richard M. ;
Cavalli-Sforza, Luigi Luca ;
Feldman, Marcus W. ;
Pritchard, Jonathan K. .
PLOS GENETICS, 2009, 5 (06)
[10]   Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder [J].
Demontis, Ditte ;
Walters, Raymond K. ;
Martin, Joanna ;
Mattheisen, Manuel ;
Als, Thomas D. ;
Agerbo, Esben ;
Baldursson, Gisli ;
Belliveau, Rich ;
Bybjerg-Grauholm, Jonas ;
Baekvad-Hansen, Marie ;
Cerrato, Felecia ;
Chambert, Kimberly ;
Churchhouse, Claire ;
Dumont, Ashley ;
Eriksson, Nicholas ;
Gandal, Michael ;
Goldstein, Jacqueline I. ;
Grasby, Katrina L. ;
Grove, Jakob ;
Gudmundsson, Olafur O. ;
Hansen, Christine S. ;
Hauberg, Mads Engel ;
Hollegaard, Mads V. ;
Howrigan, Daniel P. ;
Huang, Hailiang ;
Maller, Julian B. ;
Martin, Alicia R. ;
Martin, Nicholas G. ;
Moran, Jennifer ;
Pallesen, Jonatan ;
Palmer, Duncan S. ;
Pedersen, Carsten Bocker ;
Pedersen, Marianne Giortz ;
Poterba, Timothy ;
Poulsen, Jesper Buchhave ;
Ripke, Stephan ;
Robinson, Elise B. ;
Satterstrom, F. Kyle ;
Stefansson, Hreinn ;
Stevens, Christine ;
Turley, Patrick ;
Walters, G. Bragi ;
Won, Hyejung ;
Wright, Margaret J. ;
Andreassen, Ole A. ;
Asherson, Philip ;
Burton, Christie L. ;
Boomsma, Dorret I. ;
Cormand, Bru ;
Dalsgaard, Soren .
NATURE GENETICS, 2019, 51 (01) :63-+