Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq

被引:46
作者
Williams, Claire R. [1 ]
Baccarella, Alyssa [2 ]
Parrish, Jay Z. [1 ]
Kim, Charles C. [2 ,3 ]
机构
[1] Univ Washington, Dept Biol, Seattle, WA 98195 USA
[2] Univ Calif San Francisco, Dept Med, Div Expt Med, San Francisco, CA 94143 USA
[3] Verily, San Francisco, CA 94080 USA
基金
美国国家卫生研究院;
关键词
Monocytes; Classical; Nonclassical; RNA-Seq; Gene expression analysis; GENE-EXPRESSION; PROFILING REVEALS; QUANTIFICATION; ALIGNMENT; READS; AMPLIFICATION; MICROARRAYS; INFERENCE; HISAT;
D O I
10.1186/s12859-016-1457-z
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: RNA-Seq has supplanted microarrays as the preferred method of transcriptome-wide identification of differentially expressed genes. However, RNA-Seq analysis is still rapidly evolving, with a large number of tools available for each of the three major processing steps: read alignment, expression modeling, and identification of differentially expressed genes. Although some studies have benchmarked these tools against gold standard gene expression sets, few have evaluated their performance in concert with one another. Additionally, there is a general lack of testing of such tools on real-world, physiologically relevant datasets, which often possess qualities not reflected in tightly controlled reference RNA samples or synthetic datasets. Results: Here, we evaluate 219 combinatorial implementations of the most commonly used analysis tools for their impact on differential gene expression analysis by RNA-Seq. A test dataset was generated using highly purified human classical and nonclassical monocyte subsets from a clinical cohort, allowing us to evaluate the performance of 495 unique workflows, when accounting for differences in expression units and gene-versus transcript-level estimation. We find that the choice of methodologies leads to wide variation in the number of genes called significant, as well as in performance as gauged by precision and recall, calculated by comparing our RNA-Seq results to those from four previously published microarray and BeadChip analyses of the same cell populations. The method of differential gene expression identification exhibited the strongest impact on performance, with smaller impacts from the choice of read aligner and expression modeler. Many workflows were found to exhibit similar overall performance, but with differences in their calibration, with some biased toward higher precision and others toward higher recall. Conclusions: There is significant heterogeneity in the performance of RNA-Seq workflows to identify differentially expressed genes. Among the higher performing workflows, different workflows exhibit a precision/recall tradeoff, and the ultimate choice of workflow should take into consideration how the results will be used in subsequent applications. Our analyses highlight the performance characteristics of these workflows, and the data generated in this study could also serve as a useful resource for future development of software for RNA-Seq analysis.
引用
收藏
页数:12
相关论文
共 59 条
[1]   Transcriptional profiling reveals developmental relationship and distinct biological functions of CD16+and CD16-monocyte subsets [J].
Ancuta, Petronela ;
Liu, Kuang-Yu ;
Misra, Vikas ;
Wacleche, Vanessa Sue ;
Gosselin, Annie ;
Zhou, Xiaobo ;
Gabuzda, Dana .
BMC GENOMICS, 2009, 10 :403
[2]   HTSeq-a Python']Python framework to work with high-throughput sequencing data [J].
Anders, Simon ;
Pyl, Paul Theodor ;
Huber, Wolfgang .
BIOINFORMATICS, 2015, 31 (02) :166-169
[3]   Comparing reference-based RNA-Seq mapping methods for non-human primate data [J].
Benjamin, Ashlee M. ;
Nichols, Marshall ;
Burke, Thomas W. ;
Ginsburg, Geoffrey S. ;
Lucas, Joseph E. .
BMC GENOMICS, 2014, 15
[4]   Near-optimal probabilistic RNA-seq quantification (vol 34, pg 525, 2016) [J].
Bray, Nicolas L. ;
Pimentel, Harold ;
Melsted, Pall ;
Pachter, Lior .
NATURE BIOTECHNOLOGY, 2016, 34 (08) :888-888
[5]   The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq [J].
Di, Yanming ;
Schafer, Daniel W. ;
Cumbie, Jason S. ;
Chang, Jeff H. .
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01)
[6]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[7]  
Engström PG, 2013, NAT METHODS, V10, P1185, DOI [10.1038/NMETH.2722, 10.1038/nmeth.2722]
[8]   RNA-Seq Gene Profiling - A Systematic Empirical Comparison [J].
Fonseca, Nuno A. ;
Marioni, John ;
Brazma, Alvis .
PLOS ONE, 2014, 9 (09)
[9]   Transcript profiling of CD16-positive monocytes reveals a unique molecular fingerprint [J].
Frankenberger, Marion ;
Hofer, Thomas P. J. ;
Marei, Ayman ;
Dayyani, Farshid ;
Schewe, Stefan ;
Strasser, Christine ;
Aldraihim, Asaad ;
Stanzel, Franz ;
Lang, Roland ;
Hoffmann, Reinhard ;
da Costa, Olivia Prazeres ;
Buch, Thorsten ;
Ziegler-Heitbrock, Loems .
EUROPEAN JOURNAL OF IMMUNOLOGY, 2012, 42 (04) :957-974
[10]  
Garber M, 2011, NAT METHODS, V8, P469, DOI [10.1038/NMETH.1613, 10.1038/nmeth.1613]