Bayesian optimal discovery procedure for simultaneous significance testing

被引:15
作者
Cao, Jing [1 ]
Xie, Xian-Jin [2 ]
Zhang, Song [2 ]
Whitehurst, Angelique [3 ]
White, Michael A. [3 ]
机构
[1] So Methodist Univ, Dept Stat Sci, Dallas, TX 75275 USA
[2] Univ Texas SW Med Ctr Dallas, Dept Clin Sci, Dallas, TX 75390 USA
[3] Univ Texas SW Med Ctr Dallas, Dept Cell Biol, Dallas, TX 75390 USA
关键词
DIFFERENTIALLY EXPRESSED GENES; STATISTICAL TESTS; T-TEST; MICROARRAY; REANALYSIS; DATASET;
D O I
10.1186/1471-2105-10-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In high throughput screening, such as differential gene expression screening, drug sensitivity screening, and genome-wide RNAi screening, tens of thousands of tests need to be conducted simultaneously. However, the number of replicate measurements per test is extremely small, rarely exceeding 3. Several current approaches demonstrate that test statistics with shrinking variance estimates have more power over the traditional t statistic. Results: We propose a Bayesian hierarchical model to incorporate the shrinkage concept by introducing a mixture structure on variance components. The estimates from the Bayesian model are utilized in the optimal discovery procedure (ODP) proposed by Storey in 2007, which was shown to have optimal performance in multiple significance tests. We compared the performance of the Bayesian ODP with several competing test statistics. Conclusion: We have conducted simulation studies with 2 to 6 replicates per gene. We have also included test results from two real datasets. The Bayesian ODP outperforms the other methods in our study, including the original ODP. The advantage of the Bayesian ODP becomes more significant when there are few replicates per test. The improvement over the original ODP is based on the fact that Bayesian model borrows strength across genes in estimating unknown parameters. The proposed approach is efficient in computation due to the conjugate structure of the Bayesian model. The R code (see Additional file 1) to calculate the Bayesian ODP is provided.
引用
收藏
页数:15
相关论文
共 24 条
[1]   A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes [J].
Baldi, P ;
Long, AD .
BIOINFORMATICS, 2001, 17 (06) :509-519
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   A distribution free summarization method for Affymetrix GeneChip® arrays [J].
Chen, Zhongxue ;
McGee, Monnie ;
Liu, Qingzhong ;
Scheuermann, Richard H. .
BIOINFORMATICS, 2007, 23 (03) :321-327
[4]   Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset [J].
Choe, SE ;
Boutros, M ;
Michelson, AM ;
Church, GM ;
Halfon, MS .
GENOME BIOLOGY, 2005, 6 (02)
[5]   Improved statistical tests for differential gene expression by shrinking variance components estimates [J].
Cui, XG ;
Hwang, JTG ;
Qiu, J ;
Blades, NJ ;
Churchill, GA .
BIOSTATISTICS, 2005, 6 (01) :59-75
[6]   Statistical tests for differential expression in cDNA microarray experiments [J].
Cui, XQ ;
Churchill, GA .
GENOME BIOLOGY, 2003, 4 (04)
[7]   A reanalysis of a published Affymetrix GeneChip control dataset [J].
Dabney, AR ;
Storey, JD .
GENOME BIOLOGY, 2006, 7 (03)
[8]   Empirical Bayes analysis of a microarray experiment [J].
Efron, B ;
Tibshirani, R ;
Storey, JD ;
Tusher, V .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1151-1160
[9]   A two-sample Bayesian t-test for microarray data [J].
Fox, RJ ;
Dimmic, MW .
BMC BIOINFORMATICS, 2006, 7 (1)
[10]   Putative null distributions corresponding to tests of differential expression in the Golden Spike dataset are intensity dependent [J].
Gaile, Daniel P. ;
Miecznikowski, Jeffrey C. .
BMC GENOMICS, 2007, 8 (1)