Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies

被引：105

作者：

Jeanmougin, Marine ^{[1
,2
,3
,4
]}

de Reynies, Aurelien ^{[1
]}

Marisa, Laetitia ^{[1
]}

Paccard, Caroline ^{[2
]}

Nuel, Gregory ^{[3
]}

Guedj, Mickael ^{[1
,2
]}

机构：

[1] Ligue Natl Canc, Programme Cartes Identite Tumeurs CIT, Paris, France

[2] Dept Biostat, Paris, France

[3] Paris Descartes Univ, Dept Appl Math MAPS, UMR CNRS 8145, Paris, France

[4] Univ Evry, Stat & Genome Lab, UMR CNRS 8071, Evry, France

来源：

PLOS ONE | 2010年 / 5卷 / 09期

关键词：

SAM;

D O I：

10.1371/journal.pone.0012336

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alternatives have been developed based on different and innovative variance modeling strategies. However, a critical issue is that selecting a different test usually leads to a different gene list. In this context and given the current tendency to apply the t-test, identifying the most efficient approach in practice remains crucial. To provide elements to answer, we conduct a comparison of eight tests representative of variance modeling strategies in gene expression data: Welch's t-test, ANOVA [1], Wilcoxon's test, SAM [2], RVM [3], limma [4], VarMixt [5] and SMVar [6]. Our comparison process relies on four steps (gene list analysis, simulations, spike-in data and re-sampling) to formulate comprehensive and robust conclusions about test performance, in terms of statistical power, false-positive rate, execution time and ease of use. Our results raise concerns about the ability of some methods to control the expected number of false positives at a desirable level. Besides, two tests (limma and VarMixt) show significant improvement compared to the t-test, in particular to deal with small sample sizes. In addition limma presents several practical advantages, so we advocate its application to analyze gene expression data.

引用

页码：1 / 9

页数：9

共 33 条

[1] Microarray data analysis: from disarray to consolidation and consensus [J].

Allison, DB ;

Cui, XQ ;

Page, GP ;

Sabripour, M .

NATURE REVIEWS GENETICS, 2006, 7 (01) :55-65

[2] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].

BENJAMINI, Y ;

HOCHBERG, Y .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300

[3] Exquisite sensitivity of TP53 mutant and basal breast cancers to a dose-dense epirubicin-cyclophosphamide regimen [J].

Bertheau, Philippe ;

Turpin, Elisabeth ;

Rickman, David S. ;

Espie, Marc ;

de Reynies, Aurelien ;

Feugeas, Jean-Paul ;

Plassa, Louis-Francois ;

Soliman, Hany ;

Varna, Mariana ;

de Roquancourt, Anne ;

Lehmann-Che, Jacqueline ;

Beuzard, Yves ;

Marty, Michel ;

Misset, Jean-Louis ;

Janin, Anne ;

de The, Hugues .

PLOS MEDICINE, 2007, 4 (03) :585-594

[4] Transcriptome classification of HCC is related to gene alterations and to new therapeutic targets [J].

Boyault, Sandrine ;

Rickman, David S. ;

de Reynies, Aurelien ;

Balabaud, Charles ;

Rebouissou, Sandra ;

Jeannot, Emmanuelle ;

Herault, Aurelie ;

Saric, Jean ;

Belghiti, Jacques ;

Franco, Dominique ;

Bioulac-Sage, Paulette ;

Laurent-Puig, Pierre ;

Zucman-Rossi, Jessica .

HEPATOLOGY, 2007, 45 (01) :42-52

[5]

Chessel D., 2004, R NEWS, V4, P5, DOI DOI 10.2307/3780087

[6] VarMixt: efficient variance modelling for the differential analysis of replicated gene expression data [J].

Delmar, P ;

Robin, S ;

Daudin, JJ .

BIOINFORMATICS, 2005, 21 (04) :502-508

[7]

Dudoit S., 2002, MULTIPLE HYPOTHESIS

[8]

HUANG X, 2002, FUNCTIONAL INTEGRATI, V2

[9] A structural mixed model for variances in differential gene expression studies [J].

Jaffrezic, Florence ;

Marot, Gulllemette ;

Degrelle, Severine ;

Hue, Isabelle ;

Foulley, Jean-Louis .

GENETICS RESEARCH, 2007, 89 (01) :19-25

[10] Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays [J].

Jain, N ;

Thatte, J ;

Braciale, T ;

Ley, K ;

O'Connell, M ;

Lee, JK .

BIOINFORMATICS, 2003, 19 (15) :1945-1951

← 1 2 3 4 →