simGWAS: a fast method for simulation of large scale case-control GWAS summary statistics

被引:16
作者
Fortune, Mary D. [1 ,2 ]
Wallace, Chris [1 ,2 ]
机构
[1] Univ Cambridge, Cambridge Inst Publ Hlth, MRC Biostat Unit, Cambridge Biomed Campus, Cambridge CB2 0SR, England
[2] Univ Cambridge, Addenbrookes Hosp, Dept Med, Cambridge CB2 0SP, England
基金
英国惠康基金;
关键词
GENOME-WIDE ASSOCIATION; VARIANTS;
D O I
10.1093/bioinformatics/bty898
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Methods for analysis of GWAS summary statistics have encouraged data sharing and democratized the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some 'truth' is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study. Results: We have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis.
引用
收藏
页码:1901 / 1906
页数:6
相关论文
共 25 条
  • [1] A global reference for human genetic variation
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Wang, Jun
    Wilson, Richard K.
    Boerwinkle, Eric
    Doddapaneni, Harsha
    Han, Yi
    Korchina, Viktoriya
    Kovar, Christie
    Lee, Sandra
    Muzny, Donna
    Reid, Jeffrey G.
    Zhu, Yiming
    Chang, Yuqi
    Feng, Qiang
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Lan, Tianming
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Liu, Shengmao
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Tang, Meifang
    Wang, Bo
    [J]. NATURE, 2015, 526 (7571) : 68 - +
  • [2] Approximately independent linkage disequilibrium blocks in human populations
    Berisa, Tomaz
    Pickrell, Joseph K.
    [J]. BIOINFORMATICS, 2016, 32 (02) : 283 - 285
  • [3] LD Score regression distinguishes confounding from polygenicity in genome-wide association studies
    Bulik-Sullivan, Brendan K.
    Loh, Po-Ru
    Finucane, Hilary K.
    Ripke, Stephan
    Yang, Jian
    Patterson, Nick
    Daly, Mark J.
    Price, Alkes L.
    Neale, Benjamin M.
    [J]. NATURE GENETICS, 2015, 47 (03) : 291 - +
  • [4] VSEAMS: a pipeline for variant set enrichment analysis using summary GWAS data identifies IKZF3, BATF and ESRRA as key transcription factors in type 1 diabetes
    Burren, Oliver S.
    Guo, Hui
    Wallace, Chris
    [J]. BIOINFORMATICS, 2014, 30 (23) : 3342 - 3348
  • [5] An atlas of genetic associations in UK Biobank
    Canela-Xandri, Oriol
    Rawlik, Konrad
    Tenesa, Albert
    [J]. NATURE GENETICS, 2018, 50 (11) : 1593 - +
  • [6] Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics
    Chen, Wenan
    Larrabee, Beth R.
    Ovsyannikova, Inna G.
    Kennedy, Richard B.
    Haralambieva, Iana H.
    Poland, Gregory A.
    Schaid, Daniel J.
    [J]. GENETICS, 2015, 200 (03) : 719 - +
  • [7] Public Access to Genome-Wide Data: Five Views on Balancing Research with Privacy and Protection
    Church, George
    Heeney, Catherine
    Hawkins, Naomi
    de Vries, Jantina
    Boddington, Paula
    Kaye, Jane
    Bobrow, Martin
    Weir, Bruce
    [J]. PLOS GENETICS, 2009, 5 (10):
  • [8] Predictive accuracy of combined genetic and environmental risk scores
    Dudbridge, Frank
    Pashayan, Nora
    Yang, Jian
    [J]. GENETIC EPIDEMIOLOGY, 2018, 42 (01) : 4 - 19
  • [9] Comparison of Methods for Competitive Tests of Pathway Analysis
    Evangelou, Marina
    Rendon, Augusto
    Ouwehand, Willem H.
    Wernisch, Lorenz
    Dudbridge, Frank
    [J]. PLOS ONE, 2012, 7 (07):
  • [10] Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics
    Giambartolomei, Claudia
    Vukcevic, Damjan
    Schadt, Eric E.
    Franke, Lude
    Hingorani, Aroon D.
    Wallace, Chris
    Plagnol, Vincent
    [J]. PLOS GENETICS, 2014, 10 (05):