Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets

被引:676
作者
Li, Miao-Xin [2 ,3 ]
Yeung, Juilian M. Y.
Cherny, Stacey S. [4 ]
Sham, Pak C. [1 ,2 ,3 ,4 ]
机构
[1] Univ Hong Kong, Dept Psychiat, LKS Fac Med, Pokfulam, Hong Kong, Peoples R China
[2] Univ Hong Kong, Ctr Reprod Dev & Growth, Pokfulam, Hong Kong, Peoples R China
[3] Univ Hong Kong, Genome Res Ctr, Pokfulam, Hong Kong, Peoples R China
[4] Univ Hong Kong, State Key Lab Cognit & Brain Sci, Pokfulam, Hong Kong, Peoples R China
关键词
GENOME-WIDE ASSOCIATION; SINGLE-NUCLEOTIDE POLYMORPHISMS; LINKAGE DISEQUILIBRIUM; MULTIPLE; SIMULATION; ADJUSTMENT; COVERAGE; SCANS; POWER; TOOL;
D O I
10.1007/s00439-011-1118-2
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Current genome-wide association studies (GWAS) use commercial genotyping microarrays that can assay over a million single nucleotide polymorphisms (SNPs). The number of SNPs is further boosted by advanced statistical genotype-imputation algorithms and large SNP databases for reference human populations. The testing of a huge number of SNPs needs to be taken into account in the interpretation of statistical significance in such genome-wide studies, but this is complicated by the non-independence of SNPs because of linkage disequilibrium (LD). Several previous groups have proposed the use of the effective number of independent markers (M (e)) for the adjustment of multiple testing, but current methods of calculation for M (e) are limited in accuracy or computational speed. Here, we report a more robust and fast method to calculate M (e). Applying this efficient method [implemented in a free software tool named Genetic type 1 error calculator (GEC)], we systematically examined the M (e), and the corresponding p-value thresholds required to control the genome-wide type 1 error rate at 0.05, for 13 Illumina or Affymetrix genotyping arrays, as well as for HapMap Project and 1000 Genomes Project datasets which are widely used in genotype imputation as reference panels. Our results suggested the use of a p-value threshold of similar to 10(-7) as the criterion for genome-wide significance for early commercial genotyping arrays, but slightly more stringent p-value thresholds similar to 5 x 10(-8) for current or merged commercial genotyping arrays, similar to 10(-8) for all common SNPs in the 1000 Genomes Project dataset and similar to 5 x 10(-8) for the common SNPs only within genes.
引用
收藏
页码:747 / 756
页数:10
相关论文
共 31 条
  • [1] Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms
    Anderson, Carl A.
    Pettersson, Fredrik H.
    Barrett, Jeffrey C.
    Zhuang, Joanna J.
    Ragoussis, Jiannis
    Cardon, Lon R.
    Morris, Andrew P.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2008, 83 (01) : 112 - 119
  • [2] Evaluating coverage of genome-wide association studies
    Barrett, Jeffrey C.
    Cardon, Lon R.
    [J]. NATURE GENETICS, 2006, 38 (06) : 659 - 662
  • [3] A simple correction for multiple comparisons in interval mapping genome scans
    Cheverud, JM
    [J]. HEREDITY, 2001, 87 (1) : 52 - 58
  • [4] So many correlated tests, so little time!: Rapid adjustment of P values for multiple correlated tests
    Conneely, Karen N.
    Boehnke, Michael
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (06) : 1158 - 1168
  • [5] Estimation of significance thresholds for genomewide association scans
    Dudbridge, Frank
    Gusnanto, Arief
    [J]. GENETIC EPIDEMIOLOGY, 2008, 32 (03) : 227 - 234
  • [6] Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies
    Duggal, Priya
    Gillanders, Elizabeth M.
    Holmes, Taura N.
    Bailey-Wilson, Joan E.
    [J]. BMC GENOMICS, 2008, 9 (1)
  • [7] A second generation human haplotype map of over 3.1 million SNPs
    Frazer, Kelly A.
    Ballinger, Dennis G.
    Cox, David R.
    Hinds, David A.
    Stuve, Laura L.
    Gibbs, Richard A.
    Belmont, John W.
    Boudreau, Andrew
    Hardenbol, Paul
    Leal, Suzanne M.
    Pasternak, Shiran
    Wheeler, David A.
    Willis, Thomas D.
    Yu, Fuli
    Yang, Huanming
    Zeng, Changqing
    Gao, Yang
    Hu, Haoran
    Hu, Weitao
    Li, Chaohua
    Lin, Wei
    Liu, Siqi
    Pan, Hao
    Tang, Xiaoli
    Wang, Jian
    Wang, Wei
    Yu, Jun
    Zhang, Bo
    Zhang, Qingrun
    Zhao, Hongbin
    Zhao, Hui
    Zhou, Jun
    Gabriel, Stacey B.
    Barry, Rachel
    Blumenstiel, Brendan
    Camargo, Amy
    Defelice, Matthew
    Faggart, Maura
    Goyette, Mary
    Gupta, Supriya
    Moore, Jamie
    Nguyen, Huy
    Onofrio, Robert C.
    Parkin, Melissa
    Roy, Jessica
    Stahl, Erich
    Winchester, Ellen
    Ziaugra, Liuda
    Altshuler, David
    Shen, Yan
    [J]. NATURE, 2007, 449 (7164) : 851 - U3
  • [8] A New Measure of the Effective Number of Tests, a Practical Tool for Comparing Families of Non-Independent Significance Tests
    Galwey, Nicholas W.
    [J]. GENETIC EPIDEMIOLOGY, 2009, 33 (07) : 559 - 568
  • [9] A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms
    Gao, Xiaoyi
    Stamier, Joshua
    Martin, Eden R.
    [J]. GENETIC EPIDEMIOLOGY, 2008, 32 (04) : 361 - 369
  • [10] Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers
    Han, Buhm
    Kang, Hyun Min
    Eskin, Eleazar
    [J]. PLOS GENETICS, 2009, 5 (04)