Sparse Principal Component Analysis for Identifying Ancestry-Informative Markers in Genome-Wide Association Studies

被引:31
|
作者
Lee, Seokho [2 ]
Epstein, Michael P. [3 ]
Duncan, Richard [3 ]
Lin, Xihong [1 ]
机构
[1] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[2] Hankuk Univ Foreign Studies, Dept Stat, Yongin, South Korea
[3] Emory Univ, Sch Med, Dept Human Genet, Atlanta, GA USA
基金
美国国家卫生研究院; 新加坡国家研究基金会;
关键词
ancestry-informative markers; genome-wide association studies; population stratification; principal component analysis; variable selection; POPULATION STRATIFICATION; SEMIPARAMETRIC TEST; ADMIXTURE; PANEL; MAP;
D O I
10.1002/gepi.21621
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Genome-Wide association studies (GWAS) routinely apply principal component analysis (PCA) to infer population structure within a sample to correct for confounding due to ancestry. GWAS implementation of PCA uses tens of thousands of single-nucleotide polymorphisms (SNPs) to infer structure, despite the fact that only a small fraction of such SNPs provides useful information on ancestry. The identification of this reduced set of Ancestry-Informative markers (AIMs) from a GWAS has practical value; for example, researchers can genotype the AIM set to correct for potential confounding due to ancestry in follow-up studies that utilize custom SNP or sequencing technology. We propose a novel technique to identify AIMs from Genome-Wide SNP data using sparse PCA. The procedure uses penalized regression methods to identify those SNPs in a Genome-Wide panel that significantly contribute to the principal components while encouraging SNPs that provide negligible loadings to vanish from the analysis. We found that sparse PCA leads to negligible loss of ancestry information compared to traditional PCA analysis of Genome-Wide SNP data. We further demonstrate the value of sparse PCA for AIM selection using real data from the International HapMap Project and a Genome-Wide study of inflammatory bowel disease. We have implemented our approach in open-source R software for public use. Genet. Epidemiol. 36:293-302, 2012. (c) 2012 Wiley Periodicals, Inc.
引用
收藏
页码:293 / 302
页数:10
相关论文
共 50 条
  • [21] Adjustment for Population Stratification in European Association Studies via Informative Markers of the Principal Ancestry Components
    Brinster, R.
    Scherer, D.
    Bermejo, J. Lorenzo
    HUMAN HEREDITY, 2015, 80 (03) : 105 - 105
  • [22] Ancestry Adjustments in Genome-Wide Association Studies of Randomized Clinical Trials
    Lunceford, Jared K.
    Cheng, Jeff
    Wong, Peggy
    Mehrotra, Devan V.
    STATISTICS IN BIOPHARMACEUTICAL RESEARCH, 2014, 6 (02): : 137 - 143
  • [23] Genome-Wide Association Studies for Bivariate Sparse Longitudinal Data
    Das, Kiranmoy
    Li, Jiahan
    Fu, Guifang
    Wang, Zhong
    Wu, Rongling
    HUMAN HEREDITY, 2011, 72 (02) : 110 - 120
  • [24] Evaluation of methods for adjusting population stratification in genome-wide association studies: Standard versus categorical principal component analysis
    Turkmen, Asuman S.
    Yuan, Yuan
    Billor, Nedret
    ANNALS OF HUMAN GENETICS, 2019, 83 (06) : 454 - 464
  • [25] Sparse principal component analysis based on genome network for correcting cell type heterogeneity in epigenome-wide association studies
    Rui Miao
    Qi Dang
    Jie Cai
    Hai-Hui Huang
    Sheng-Li Xie
    Yong Liang
    Medical & Biological Engineering & Computing, 2022, 60 : 2601 - 2618
  • [26] Sparse principal component analysis based on genome network for correcting cell type heterogeneity in epigenome-wide association studies
    Miao, Rui
    Dang, Qi
    Cai, Jie
    Huang, Hai-Hui
    Xie, Sheng-Li
    Liang, Yong
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2022, 60 (09) : 2601 - 2618
  • [27] Identifying disease associations via genome-wide association studies
    Wenhui Huang
    Pengyuan Wang
    Zhen Liu
    Liqing Zhang
    BMC Bioinformatics, 10
  • [28] Identifying disease associations via genome-wide association studies
    Huang, Wenhui
    Wang, Pengyuan
    Liu, Zhen
    Zhang, Liqing
    BMC BIOINFORMATICS, 2009, 10
  • [29] Efficiently Identifying Significant Associations in Genome-wide Association Studies
    Kostem, Emrah
    Eskin, Eleazar
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2013, 20 (10) : 817 - 830
  • [30] Sample size and power analysis for sparse signal recovery in genome-wide association studies
    Xie, Jichun
    Cai, T. Tony
    Li, Hongzhe
    BIOMETRIKA, 2011, 98 (02) : 273 - 290