Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies

被引:105
作者
Jeanmougin, Marine [1 ,2 ,3 ,4 ]
de Reynies, Aurelien [1 ]
Marisa, Laetitia [1 ]
Paccard, Caroline [2 ]
Nuel, Gregory [3 ]
Guedj, Mickael [1 ,2 ]
机构
[1] Ligue Natl Canc, Programme Cartes Identite Tumeurs CIT, Paris, France
[2] Dept Biostat, Paris, France
[3] Paris Descartes Univ, Dept Appl Math MAPS, UMR CNRS 8145, Paris, France
[4] Univ Evry, Stat & Genome Lab, UMR CNRS 8071, Evry, France
关键词
SAM;
D O I
10.1371/journal.pone.0012336
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alternatives have been developed based on different and innovative variance modeling strategies. However, a critical issue is that selecting a different test usually leads to a different gene list. In this context and given the current tendency to apply the t-test, identifying the most efficient approach in practice remains crucial. To provide elements to answer, we conduct a comparison of eight tests representative of variance modeling strategies in gene expression data: Welch's t-test, ANOVA [1], Wilcoxon's test, SAM [2], RVM [3], limma [4], VarMixt [5] and SMVar [6]. Our comparison process relies on four steps (gene list analysis, simulations, spike-in data and re-sampling) to formulate comprehensive and robust conclusions about test performance, in terms of statistical power, false-positive rate, execution time and ease of use. Our results raise concerns about the ability of some methods to control the expected number of false positives at a desirable level. Besides, two tests (limma and VarMixt) show significant improvement compared to the t-test, in particular to deal with small sample sizes. In addition limma presents several practical advantages, so we advocate its application to analyze gene expression data.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 33 条
[1]   Microarray data analysis: from disarray to consolidation and consensus [J].
Allison, DB ;
Cui, XQ ;
Page, GP ;
Sabripour, M .
NATURE REVIEWS GENETICS, 2006, 7 (01) :55-65
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   Exquisite sensitivity of TP53 mutant and basal breast cancers to a dose-dense epirubicin-cyclophosphamide regimen [J].
Bertheau, Philippe ;
Turpin, Elisabeth ;
Rickman, David S. ;
Espie, Marc ;
de Reynies, Aurelien ;
Feugeas, Jean-Paul ;
Plassa, Louis-Francois ;
Soliman, Hany ;
Varna, Mariana ;
de Roquancourt, Anne ;
Lehmann-Che, Jacqueline ;
Beuzard, Yves ;
Marty, Michel ;
Misset, Jean-Louis ;
Janin, Anne ;
de The, Hugues .
PLOS MEDICINE, 2007, 4 (03) :585-594
[4]   Transcriptome classification of HCC is related to gene alterations and to new therapeutic targets [J].
Boyault, Sandrine ;
Rickman, David S. ;
de Reynies, Aurelien ;
Balabaud, Charles ;
Rebouissou, Sandra ;
Jeannot, Emmanuelle ;
Herault, Aurelie ;
Saric, Jean ;
Belghiti, Jacques ;
Franco, Dominique ;
Bioulac-Sage, Paulette ;
Laurent-Puig, Pierre ;
Zucman-Rossi, Jessica .
HEPATOLOGY, 2007, 45 (01) :42-52
[5]  
Chessel D., 2004, R NEWS, V4, P5, DOI DOI 10.2307/3780087
[6]   VarMixt: efficient variance modelling for the differential analysis of replicated gene expression data [J].
Delmar, P ;
Robin, S ;
Daudin, JJ .
BIOINFORMATICS, 2005, 21 (04) :502-508
[7]  
Dudoit S., 2002, MULTIPLE HYPOTHESIS
[8]  
HUANG X, 2002, FUNCTIONAL INTEGRATI, V2
[9]   A structural mixed model for variances in differential gene expression studies [J].
Jaffrezic, Florence ;
Marot, Gulllemette ;
Degrelle, Severine ;
Hue, Isabelle ;
Foulley, Jean-Louis .
GENETICS RESEARCH, 2007, 89 (01) :19-25
[10]   Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays [J].
Jain, N ;
Thatte, J ;
Braciale, T ;
Ley, K ;
O'Connell, M ;
Lee, JK .
BIOINFORMATICS, 2003, 19 (15) :1945-1951