Unite and conquer: univariate and multivariate approaches for finding differentially expressed gene sets

被引:88
作者
Glazko, Galina V. [1 ]
Emmert-Streib, Frank [2 ]
机构
[1] Univ Rochester, Med Ctr, Dept Biostat & Computat Biol, Rochester, NY 14642 USA
[2] Queens Univ Belfast, Sch Med Dent & Biomed Sci, Ctr Canc Res & Cell Biol, Belfast BT9 7BL, Antrim, North Ireland
关键词
MICROARRAY; ENRICHMENT; CATEGORIES; FRAMEWORK;
D O I
10.1093/bioinformatics/btp406
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Recently, many univariate and several multivariate approaches have been suggested for testing differential expression of gene sets between different phenotypes. However, despite a wealth of literature studying their performance on simulated and real biological data, still there is a need to quantify their relative performance when they are testing different null hypotheses. Results: In this article, we compare the performance of univariate and multivariate tests on both simulated and biological data. In the simulation study we demonstrate that high correlations equally affect the power of both, univariate as well as multivariate tests. In addition, for most of them the power is similarly affected by the dimensionality of the gene set and by the percentage of genes in the set, for which expression is changing between two phenotypes. The application of different test statistics to biological data reveals that three statistics ( sum of squared t-tests, Hotelling's T-2, N-statistic), testing different null hypotheses, find some common but also some complementing differentially expressed gene sets under specific settings. This demonstrates that due to complementing null hypotheses each test projects on different aspects of the data and for the analysis of biological data it is beneficial to use all three tests simultaneously instead of focusing exclusively on just one.
引用
收藏
页码:2348 / 2354
页数:7
相关论文
共 30 条
[1]   A general modular framework for gene set enrichment analysis [J].
Ackermann, Marit ;
Strimmer, Korbinian .
BMC BIOINFORMATICS, 2009, 10
[2]  
[Anonymous], 2001, Foundations of Systems Biology
[3]   On a new multivariate two-sample test [J].
Baringhaus, L ;
Franz, C .
JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 88 (01) :190-206
[4]   A STATISTICAL FRAMEWORK FOR TESTING FUNCTIONAL CATEGORIES IN MICROARRAY DATA [J].
Barry, William T. ;
Nobel, Andrew B. ;
Wright, Fred A. .
ANNALS OF APPLIED STATISTICS, 2008, 2 (01) :286-315
[5]   A HIGH DIMENSIONAL 2 SAMPLE SIGNIFICANCE TEST [J].
DEMPSTER, AP .
ANNALS OF MATHEMATICAL STATISTICS, 1958, 29 (04) :995-1010
[6]  
Dudoit S, 2008, SPRINGER SER STAT, P1
[7]   The chronic fatigue syndrome: A comparative pathway analysis [J].
Emmert-Streib, Frank .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2007, 14 (07) :961-972
[8]  
EmmertStreib F, 2008, ANALYSIS OF MICROARRAY DATA: A NETWORK-BASED APPROACH, P1
[9]   Analyzing gene expression data in terms of gene sets:: methodological issues [J].
Goeman, Jelle J. ;
Buehlmann, Peter .
BIOINFORMATICS, 2007, 23 (08) :980-987
[10]  
Huber Wolfgang, 2002, Bioinformatics, V18 Suppl 1, pS96