Structure-informed clustering for population stratification in association studies

被引:2
|
作者
Bose, Aritra [1 ]
Burch, Myson [1 ,2 ]
Chowdhury, Agniva [3 ]
Paschou, Peristera [4 ]
Drineas, Petros [2 ]
机构
[1] IBM TJ Watson Res Ctr, Computat Genom, Yorktown Hts, NY USA
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[3] Oak Ridge Natl Lab, Comp Sci & Math Div, Oak Ridge, TN USA
[4] Purdue Univ, Dept Biol Sci, W Lafayette, IN USA
关键词
Association studies; Populations structure; Clustering; LINKAGE-DISEQUILIBRIUM; HERITABILITY; SELECTION;
D O I
10.1186/s12859-023-05511-w
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundIdentifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing methods of population structure correction use principal component analysis or linear mixed models with a random effect when modeling associations between a trait of interest and genetic markers. However, due to stringent significance thresholds and latent interactions between the markers, these methods often fail to detect genuinely associated variants.ResultsTo overcome this, we propose CluStrat, which corrects for complex arbitrarily structured populations while leveraging the linkage disequilibrium induced distances between genetic markers. It performs an agglomerative hierarchical clustering using the Mahalanobis distance covariance matrix of the markers. In simulation studies, we show that our method outperforms existing methods in detecting true causal variants. Applying CluStrat on WTCCC2 and UK Biobank cohorts, we found biologically relevant associations in Schizophrenia and Myocardial Infarction. CluStrat was also able to correct for population structure in polygenic adaptation of height in Europeans.ConclusionsCluStrat highlights the advantages of biologically relevant distance metrics, such as the Mahalanobis distance, which captures the cryptic interactions within populations in the presence of LD better than the Euclidean distance.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Structure-informed clustering for population stratification in association studies
    Aritra Bose
    Myson Burch
    Agniva Chowdhury
    Peristera Paschou
    Petros Drineas
    BMC Bioinformatics, 24
  • [2] Structure-Informed Shadow Removal Networks
    Liu, Yuhao
    Guo, Qing
    Fu, Lan
    Ke, Zhanghan
    Xu, Ke
    Feng, Wei
    Tsang, Ivor W.
    Lau, Rynson W. H.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5823 - 5836
  • [3] Structure-informed separation of bioactive peptides
    Acquah, Caleb
    Chan, Yi Wei
    Pan, Sharadwata
    Agyei, Dominic
    Udenigwe, Chibuike C.
    JOURNAL OF FOOD BIOCHEMISTRY, 2019, 43 (01) : 1 - 10
  • [4] STRUCTURE-INFORMED POSITIONAL ENCODING FOR MUSIC GENERATION
    Agarwal, Manvi
    Wang, Changhong
    Richard, Gael
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 951 - 955
  • [5] A Structure-Informed Atlas of Human-Virus Interactions
    Lasso, Gorka
    Mayer, Sandra V.
    Winkelmann, Evandro R.
    Chu, Tim
    Elliot, Oliver
    Patino-Galindo, Juan Angel
    Park, Kernyu
    Rabadan, Raul
    Honig, Barry
    Shapira, Sagi D.
    CELL, 2019, 178 (06) : 1526 - +
  • [6] Structure-informed microbial population genetics elucidate selective pressures that shape protein evolution
    Kiefl, Evan
    Esen, Ozcan C.
    Miller, Samuel E.
    Kroll, Kourtney L.
    Willis, Amy D.
    Rappe, Michael S.
    Pan, Tao
    Eren, A. Murat
    SCIENCE ADVANCES, 2023, 9 (08)
  • [7] Structure-informed insights for NLR functioning in plant immunity
    Sukarta, Octavina C. A.
    Slootweg, Erik J.
    Goverse, Aska
    SEMINARS IN CELL & DEVELOPMENTAL BIOLOGY, 2016, 56 : 134 - 149
  • [8] Structure-informed detection and quantification of peptides in food and biological fluids
    Agyei, Dominic
    Pan, Sharadwata
    Acquah, Caleb
    Bekhit, Alaa El-Din Ahmed
    Danquah, Michael K.
    JOURNAL OF FOOD BIOCHEMISTRY, 2019, 43 (01)
  • [9] PrePPI: a structure-informed database of protein-protein interactions
    Zhang, Qiangfeng Cliff
    Petrey, Donald
    Garzon, Jose Ignacio
    Deng, Lei
    Honig, Barry
    NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D828 - D833
  • [10] Population Stratification in Secondary Genetic Association Studies
    Babron, M. -C
    Benhamou, S.
    Genin, E.
    Kazma, R.
    HUMAN HEREDITY, 2015, 79 (01) : 30 - 30