Bootstrap-based differential gene expression analysis for RNA-Seq data with and without replicates

被引:29
作者
Al Seesi, Sahar [1 ]
Tiagueu, Yvette Temate [2 ]
Zelikovsky, Alexander [2 ]
Mandoiu, Ion I. [1 ]
机构
[1] Univ Connecticut, Dept Comp Engn & Sci, Storrs, CT 06269 USA
[2] Georgia State Univ, Dept Comp Sci, Atlanta, GA 30303 USA
基金
美国食品与农业研究所; 美国国家科学基金会;
关键词
SPLICING ISOFORM FREQUENCIES;
D O I
10.1186/1471-2164-15-S8-S2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
A major application of RNA-Seq is to perform differential gene expression analysis. Many tools exist to analyze differentially expressed genes in the presence of biological replicates. Frequently, however, RNA-Seq experiments have no or very few biological replicates and development of methods for detecting differentially expressed genes in these scenarios is still an active research area. In this paper we introduce a novel method, called IsoDE, for differential gene expression analysis based on bootstrapping. We compared IsoDE against four existing methods (Fisher's exact test, GFOLD, edgeR and Cuffdiff) on RNA-Seq datasets generated using three different sequencing technologies, both with and without replicates. Experiments on MAQC RNA-Seq datasets without replicates show that IsoDE has consistently high accuracy as defined by the qPCR ground truth, frequently higher than that of the compared methods, particularly for low coverage data and at lower fold change thresholds. In experiments on RNA-Seq datasets with up to 7 replicates, IsoDE has also achieved high accuracy. Furthermore, unlike GFOLD and edgeR, IsoDE accuracy varies smoothly with the number of replicates, and is relatively uniform across the entire range of gene expression levels. The proposed non-parametric method based on bootstrapping has practical running time, and achieves robust performance over a broad range of technologies, number of replicates, sequencing depths, and minimum fold change thresholds.
引用
收藏
页数:10
相关论文
共 21 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]   NCBI GEO: archive for functional genomics data sets-10 years on [J].
Barrett, Tanya ;
Troup, Dennis B. ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Evangelista, Carlos ;
Kim, Irene F. ;
Tomashevsky, Maxim ;
Marshall, Kimberly A. ;
Phillippy, Katherine H. ;
Sherman, Patti M. ;
Muertter, Rolf N. ;
Holko, Michelle ;
Ayanbule, Oluwabukunmi ;
Yefanov, Andrey ;
Soboleva, Alexandra .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D1005-D1010
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   NPEBseq: nonparametric empirical bayesian-based procedure for differential expression analysis of RNA-seq data [J].
Bi, Yingtao ;
Davuluri, Ramana V. .
BMC BIOINFORMATICS, 2013, 14
[5]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[6]  
Efron B., 1993, INTRO BOOTSTRAP, DOI 10.1007/978-1-4899-4541-9
[7]   GFOLD: a generalized fold change for ranking differentially expressed genes from RNA-seq data [J].
Feng, Jianxing ;
Meyer, Clifford A. ;
Wang, Qian ;
Liu, Jun S. ;
Liu, X. Shirley ;
Zhang, Yong .
BIOINFORMATICS, 2012, 28 (21) :2782-2788
[8]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[9]   MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping [J].
Lee, Wan-Ping ;
Stromberg, Michael P. ;
Ward, Alistair ;
Stewart, Chip ;
Garrison, Erik P. ;
Marth, Gabor T. .
PLOS ONE, 2014, 9 (03)
[10]  
Li Bo, 2011, BMC BIOINFORMATICS, P12