Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing

被引:132
作者
Robles, Jose A. [1 ]
Qureshi, Sumaira E. [2 ]
Stephen, Stuart J. [1 ]
Wilson, Susan R. [2 ,3 ,4 ]
Burden, Conrad J. [2 ]
Taylor, Jennifer M. [1 ]
机构
[1] CSIRO Plant Ind, Black Mt Labs, Canberra, ACT, Australia
[2] Australian Natl Univ, Inst Math Sci, Canberra, ACT, Australia
[3] Univ New S Wales, Prince Wales Clin Sch, Sydney, NSW, Australia
[4] Univ New S Wales, Sch Math & Stat, Sydney, NSW, Australia
基金
澳大利亚研究理事会;
关键词
RNA-Seq; Differential expression analysis; Sequencing depth; Replication; Experimental design; Multiplex; GENE-EXPRESSION; STATISTICAL-METHODS; SEQ DATA; NORMALIZATION; BIAS; QUANTIFICATION; TRANSCRIPTOMES; REVEALS; PACKAGE; SETS;
D O I
10.1186/1471-2164-13-484
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results: Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions: This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.
引用
收藏
页数:14
相关论文
共 46 条
[31]   Improving RNA-Seq expression estimates by correcting for fragment bias [J].
Roberts, Adam ;
Trapnell, Cole ;
Donaghey, Julie ;
Rinn, John L. ;
Pachter, Lior .
GENOME BIOLOGY, 2011, 12 (03)
[32]   Moderated statistical tests for assessing differences in tag abundance [J].
Robinson, Mark D. ;
Smyth, Gordon K. .
BIOINFORMATICS, 2007, 23 (21) :2881-2887
[33]   A scaling normalization method for differential expression analysis of RNA-seq data [J].
Robinson, Mark D. ;
Oshlack, Alicia .
GENOME BIOLOGY, 2010, 11 (03)
[34]   edgeR: a Bioconductor package for differential expression analysis of digital gene expression data [J].
Robinson, Mark D. ;
McCarthy, Davis J. ;
Smyth, Gordon K. .
BIOINFORMATICS, 2010, 26 (01) :139-140
[35]   Local and global factors affecting RNA sequencing analysis [J].
Sendler, Edward ;
Johnson, Graham D. ;
Krawetz, Stephen A. .
ANALYTICAL BIOCHEMISTRY, 2011, 419 (02) :317-322
[36]   FDM: a graph-based statistical method to detect differential transcription using RNA-seq data [J].
Singh, Darshan ;
Orellana, Christian F. ;
Hu, Yin ;
Jones, Corbin D. ;
Liu, Yufeng ;
Chiang, Derek Y. ;
Liu, Jinze ;
Prins, Jan F. .
BIOINFORMATICS, 2011, 27 (19) :2633-2640
[37]   Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples [J].
Smith, Andrew M. ;
Heisler, Lawrence E. ;
St Onge, Robert P. ;
Farias-Hesson, Eveline ;
Wallace, Iain M. ;
Bodeau, John ;
Harris, Adam N. ;
Perry, Kathleen M. ;
Giaever, Guri ;
Pourmand, Nader ;
Nislow, Corey .
NUCLEIC ACIDS RESEARCH, 2010, 38 (13) :e142-e142
[38]  
Stephen S., 2012, BIOKANGA SUITE HIGH
[39]   Differential expression in RNA-seq: A matter of depth [J].
Tarazona, Sonia ;
Garcia-Alcalde, Fernando ;
Dopazo, Joaquin ;
Ferrer, Alberto ;
Conesa, Ana .
GENOME RESEARCH, 2011, 21 (12) :2213-2223
[40]   Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics [J].
Timmermans, M. J. T. N. ;
Dodsworth, S. ;
Culverwell, C. L. ;
Bocak, L. ;
Ahrens, D. ;
Littlewood, D. T. J. ;
Pons, J. ;
Vogler, A. P. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (21) :e197