Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies

被引:125
作者
Dudbridge, F
Koeleman, BPC
机构
[1] MRC, Biostat Unit, Cambridge CB2 2SR, England
[2] MRC, Rosalind Franklin Ctr Gen Res, Cambridge CB2 2SR, England
[3] Univ Med Ctr Utrecht, Dept Med Genet, Utrecht, Netherlands
关键词
D O I
10.1086/423738
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Large exploratory studies, including candidate-gene-association testing, genomewide linkage-disequilibrium scans, and array-expression experiments, are becoming increasingly common. A serious problem for such studies is that statistical power is compromised by the need to control the false-positive rate for a large family of tests. Because multiple true associations are anticipated, methods have been proposed that combine evidence from the most significant tests, as a more powerful alternative to individually adjusted tests. The practical application of these methods is currently limited by a reliance on permutation testing to account for the correlated nature of single-nucleotide polymorphism (SNP)-association data. On a genomewide scale, this is both very time-consuming and impractical for repeated explorations with standard marker panels. Here, we alleviate these problems by fitting analytic distributions to the empirical distribution of combined evidence. We fit extreme-value distributions for fixed lengths of combined evidence and a beta distribution for the most significant length. An initial phase of permutation sampling is required to fit these distributions, but it can be completed more quickly than a simple permutation test and need be done only once for each panel of tests, after which the fitted parameters give a reusable calibration of the panel. Our approach is also a more efficient alternative to a standard permutation test. We demonstrate the accuracy of our approach and compare its efficiency with that of permutation tests on genomewide SNP data released by the International HapMap Consortium. The estimation of analytic distributions for combined evidence will allow these powerful methods to be applied more widely in large exploratory studies.
引用
收藏
页码:424 / 435
页数:12
相关论文
共 46 条
[11]   Pedigree disequilibrium tests for multilocus haplotypes [J].
Dudbridge, F .
GENETIC EPIDEMIOLOGY, 2003, 25 (02) :115-121
[12]   Rank truncated product of P-values, with application to genomewide association scans [J].
Dudbridge, F ;
Koeleman, BPC .
GENETIC EPIDEMIOLOGY, 2003, 25 (04) :360-366
[13]  
Fisher RA, 1932, STAT METHODS RES WOR
[14]  
Gumbel E. J., 1958, Statistics of Extremes
[15]   Power estimation of multiple SNP association test of case-control study and application [J].
Hao, K ;
Xu, X ;
Laird, N ;
Wang, XB ;
Xu, XP .
GENETIC EPIDEMIOLOGY, 2004, 26 (01) :22-30
[16]   Trimming, weighting, and grouping SNPs in human case-control association studies [J].
Hoh, J ;
Wille, A ;
Ott, J .
GENOME RESEARCH, 2001, 11 (12) :2115-2119
[17]   Mathematical multi-locus approaches to localizing complex human trait genes [J].
Hoh, J ;
Ott, J .
NATURE REVIEWS GENETICS, 2003, 4 (09) :701-709
[18]   Limitations of stratifying sib-pair data in common disease linkage studies: An example using chromosome 10p14-10q11 in type 1 diabetes [J].
Johnson, GCL ;
Koeleman, BPC ;
Todd, JA .
AMERICAN JOURNAL OF MEDICAL GENETICS, 2002, 113 (02) :158-166
[19]   METHODS FOR ASSESSING THE STATISTICAL SIGNIFICANCE OF MOLECULAR SEQUENCE FEATURES BY USING GENERAL SCORING SCHEMES [J].
KARLIN, S ;
ALTSCHUL, SF .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1990, 87 (06) :2264-2268
[20]  
LI W, 2003, 7 INT C COMP MOL BIO