Comparative evaluation of gene set analysis approaches for RNA-Seq data

被引:22
作者
Rahmatallah, Yasir [1 ]
Emmert-Streib, Frank [2 ]
Glazko, Galina [1 ]
机构
[1] Univ Arkansas Med Sci, Div Biomed Informat, Little Rock, AR 72205 USA
[2] Queens Univ Belfast, Computat Biol & Machine Learning Lab, Ctr Canc Res & Cell Biol, Sch Med Dent & Biomed Sci, Belfast BT9 7BL, Antrim, North Ireland
基金
英国工程与自然科学研究理事会; 美国国家科学基金会;
关键词
DIFFERENTIAL EXPRESSION ANALYSIS; MICROARRAY DATA; NORMALIZATION; TESTS;
D O I
10.1186/s12859-014-0397-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Over the last few years transcriptome sequencing (RNA-Seq) has almost completely taken over microarrays for high-throughput studies of gene expression. Currently, the most popular use of RNA-Seq is to identify genes which are differentially expressed between two or more conditions. Despite the importance of Gene Set Analysis (GSA) in the interpretation of the results from RNA-Seq experiments, the limitations of GSA methods developed for microarrays in the context of RNA-Seq data are not well understood. Results: We provide a thorough evaluation of popular multivariate and gene-level self-contained GSA approaches on simulated and real RNA-Seq data. The multivariate approach employs multivariate non-parametric tests combined with popular normalizations for RNA-Seq data. The gene-level approach utilizes univariate tests designed for the analysis of RNA-Seq data to find gene-specific P-values and combines them into a pathway P-value using classical statistical techniques. Our results demonstrate that the Type I error rate and the power of multivariate tests depend only on the test statistics and are insensitive to the different normalizations. In general standard multivariate GSA tests detect pathways that do not have any bias in terms of pathways size, percentage of differentially expressed genes, or average gene length in a pathway. In contrast the Type I error rate and the power of gene-level GSA tests are heavily affected by the methods for combining P-values, and all aforementioned biases are present in detected pathways. Conclusions: Our result emphasizes the importance of using self-contained non-parametric multivariate tests for detecting differentially expressed pathways for RNA-Seq data and warns against applying gene-level GSA tests, especially because of their high level of Type I error rates for both, simulated and real data.
引用
收藏
页数:15
相关论文
共 45 条
[1]   A general modular framework for gene set enrichment analysis [J].
Ackermann, Marit ;
Strimmer, Korbinian .
BMC BIOINFORMATICS, 2009, 10
[2]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[3]   On a new multivariate two-sample test [J].
Baringhaus, L ;
Franz, C .
JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 88 (01) :190-206
[4]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[5]   Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters [J].
Core, Leighton J. ;
Waterfall, Joshua J. ;
Lis, John T. .
SCIENCE, 2008, 322 (5909) :1845-1848
[6]   A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis [J].
Dillies, Marie-Agnes ;
Rau, Andrea ;
Aubert, Julie ;
Hennequet-Antier, Christelle ;
Jeanmougin, Marine ;
Servant, Nicolas ;
Keime, Celine ;
Marot, Guillemette ;
Castel, David ;
Estelle, Jordi ;
Guernec, Gregory ;
Jagla, Bernd ;
Jouneau, Luc ;
Laloe, Denis ;
Le Gall, Caroline ;
Schaeffer, Brigitte ;
Le Crom, Stephane ;
Guedj, Mickael ;
Jaffrezic, Florence .
BRIEFINGS IN BIOINFORMATICS, 2013, 14 (06) :671-683
[7]   Improving gene set analysis of microarray data by SAM-GS [J].
Dinu, Irina ;
Potter, John D. ;
Mueller, Thomas ;
Liu, Qi ;
Adewale, Adeniyi J. ;
Jhangri, Gian S. ;
Einecke, Gunilla ;
Famulski, Konrad S. ;
Halloran, Philip ;
Yasui, Yutaka .
BMC BIOINFORMATICS, 2007, 8 (1)
[8]   Gene-set analysis and reduction [J].
Dinu, Irina ;
Potter, John D. ;
Mueller, Thomas ;
Liu, Qi ;
Adewale, Adeniyi J. ;
Jhangri, Gian S. ;
Einecke, Gunilla ;
Famulski, Konrad S. ;
Halloran, Philip ;
Yasui, Yutaka .
BRIEFINGS IN BIOINFORMATICS, 2009, 10 (01) :24-34
[9]   Escape from X inactivation [J].
Disteche, CM ;
Filippova, GN ;
Tsuchiya, KD .
CYTOGENETIC AND GENOME RESEARCH, 2002, 99 (1-4) :36-43
[10]   Pathway Analysis of Expression Data: Deciphering Functional Building Blocks of Complex Diseases [J].
Emmert-Streib, Frank ;
Glazko, Galina V. .
PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (05)