Correlated z-Values and the Accuracy of Large-Scale Statistical Estimates

被引:73
作者
Efron, Bradley [1 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
Acceleration; Correlation penalty; Empirical process; Mehler's identity; Nonnull z-values; Rms correlation; BOOTSTRAP CONFIDENCE-INTERVALS; EMPIRICAL BAYES; NORMALIZATION; MICROARRAYS; DISCOVERY; VARIANCE;
D O I
10.1198/jasa.2010.tm09129
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider large-scale studies in which there are hundreds or thousands of correlated cases to investigate, each represented by its own normal variate, typically a z-value. A familiar example is provided by a microarray experiment comparing healthy with sick subjects' expression levels for thousands of genes. This paper concerns the accuracy of summary statistics for the collection of normal variates, such as their empirical cdf or a false discovery rate statistic. It seems like we must estimate an N by N correlation matrix, N the number of cases, but our main result shows that this is not necessary: good accuracy approximations can be based on the root mean square correlation over all N . (N - 1)/2 pairs, a quantity often easily estimated. A second result shows that z-values closely follow normal distributions even under nonnull conditions, supporting application of the main theorem. Practical application of the theory is illustrated for a large leukemia microarray study.
引用
收藏
页码:1042 / 1055
页数:14
相关论文
共 21 条
  • [1] [Anonymous], 1993, Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment
  • [2] A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
    Bolstad, BM
    Irizarry, RA
    Åstrand, M
    Speed, TP
    [J]. BIOINFORMATICS, 2003, 19 (02) : 185 - 193
  • [3] ROBUSTNESS OF MULTIPLE TESTING PROCEDURES AGAINST DEPENDENCE
    Clarke, Sandy
    Hall, Peter
    [J]. ANNALS OF STATISTICS, 2009, 37 (01) : 332 - 358
  • [4] Csorgo S, 1996, PROBAB THEORY REL, V104, P15
  • [5] DESAI K, 2009, ANN APPL ST IN PRESS
  • [6] Multiple hypothesis testing in microarray experiments
    Dudoit, S
    Shaffer, JP
    Boldrick, JC
    [J]. STATISTICAL SCIENCE, 2003, 18 (01) : 71 - 103
  • [7] Dudoit S., 2004, STAT APPL GENET MOL, V3, pArticl, DOI [10.2202/1544-6115.1040, DOI 10.2202/1544-6115.1040]
  • [8] EFRON B, 1987, J AM STAT ASSOC, V82, P171, DOI 10.2307/2289144
  • [9] BOOTSTRAP CONFIDENCE-INTERVALS FOR A CLASS OF PARAMETRIC PROBLEMS
    EFRON, B
    [J]. BIOMETRIKA, 1985, 72 (01) : 45 - 58
  • [10] Microarrays, empirical Bayes and the two-groups model
    Efron, Bradley
    [J]. STATISTICAL SCIENCE, 2008, 23 (01) : 1 - 22