Optimal sample size for multiple testing:: The case of gene expression microarrays

被引:155
作者
Müller, P
Parmigiani, G
Robert, C
Rousseau, J
机构
[1] Univ Texas, MD Anderson Canc Ctr, Dept Biostat, Houston, TX 77030 USA
[2] Johns Hopkins Univ, Dept Oncol Biostat & Pathol, Baltimore, MD 21205 USA
[3] Univ Paris 09, CEREMADE, F-75775 Paris, France
[4] INSEE, CREST, F-75775 Paris, France
[5] Univ Paris 05, Paris, France
关键词
false-discovery rate; genomic data analysis; multiple comparison;
D O I
10.1198/016214504000001646
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the choice of an optimal sample size for multiple-comparison problems. The motivating application is the choice of the number of microarray experiments to be carried out when learning about differential gene expression. However, the approach is valid in any application that involves multiple comparisons in a large number of hypothesis tests. We discuss two decision problems in the context of this setup: the. sample size selection and the decision about the multiple comparisons. We adopt a decision-theoretic approach, using loss functions that combine the competing goals of discovering as many differentially expressed genes as possible, while keeping the number of false discoveries manageable. For consistency, we use the same loss function for both decisions. The decision rule that emerges for the multiple-comparison problem takes the exact form of the rules proposed in the recent literature to control the posterior expected false-discovery rate. For the sample size selection, we combine the expected utility argument with an additional sensitivity analysis, reporting the conditional expected utilities and conditioning on assumed levels of the true differential expression. We recognize the resulting diagnostic as a form of statistical power facilitating interpretation and communication. As a sampling model for observed gene expression densities across genes and arrays, we use a variation of a hierarchical gamma/gamma model. But the discussion of the decision problem is independent of the chosen probability model. The approach is valid for any model that includes positive prior probabilities for the null hypotheses in the multiple comparisons and that allows for efficient marginal and posterior simulation, possibly by dependent Markov chain Monte Carlo simulation.
引用
收藏
页码:990 / 1001
页数:12
相关论文
共 33 条
  • [1] Adcock CJ, 1997, J ROY STAT SOC D-STA, V46, P261
  • [2] [Anonymous], GENOME BIOL
  • [3] [Anonymous], 2002, Microarrays for an integrative genomics
  • [4] Identifying differentially expressed genes in cDNA microarray experiments
    Baggerly, KA
    Coombes, KR
    Hess, KR
    Stivers, DN
    Abruzzo, LV
    Zhang, W
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (06) : 639 - 659
  • [5] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [6] BERGER J. O., 2013, Statistical Decision Theory and Bayesian Analysis, DOI [10.1007/978-1-4757-4286-2, DOI 10.1007/978-1-4757-4286-2]
  • [7] BICKEL DR, 2003, SELECTING OPTIMAL RE
  • [8] BRYAN J, 2001, BIOSTATISTICS, V2, P445
  • [9] DeGroot M., 1970, OPTIMAL STAT DECISIO
  • [10] Expression profiling using cDNA microarrays
    Duggan, DJ
    Bittner, M
    Chen, YD
    Meltzer, P
    Trent, JM
    [J]. NATURE GENETICS, 1999, 21 (Suppl 1) : 10 - 14