RNAontheBENCH: computational and empirical resources for benchmarking RNAseq quantification and differential expression methods

被引:28
作者
Germain, Pierre-Luc [1 ]
Vitriolo, Alessandro [1 ,2 ]
Adamo, Antonio [1 ,3 ]
Laise, Pasquale [1 ]
Das, Vivek [1 ,2 ]
Testa, Giuseppe [1 ,2 ]
机构
[1] European Inst Oncol, Dept Expt Oncol, Via Adamello 16, I-20139 Milan, Italy
[2] Univ Milan, Dept Oncol & Hematooncol, Via Festa Perdono 7, I-20122 Milan, Italy
[3] King Abdullah Univ Sci & Technol, Biol & Environm Sci & Engn Div, Thuwal 239556900, Saudi Arabia
基金
欧洲研究理事会;
关键词
GENE-EXPRESSION; SEQ; ALIGNMENT; PROGRAMS;
D O I
10.1093/nar/gkw448
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
RNA sequencing (RNAseq) has become the method of choice for transcriptome analysis, yet no consensus exists as to the most appropriate pipeline for its analysis, with current benchmarks suffering important limitations. Here, we address these challenges through a rich benchmarking resource harnessing (i) two RNAseq datasets including ERCC ExFold spike-ins; (ii) Nanostring measurements of a panel of 150 genes on the same samples; (iii) a set of internal, genetically-determined controls; (iv) a reanalysis of the SEQC dataset; and (v) a focus on relative quantification (i.e. across-samples). We use this resource to compare different approaches to each step of RNAseq analysis, from alignment to differential expression testing. We show that methods providing the best absolute quantification do not necessarily provide good relative quantification across samples, that count-based methods are superior for gene-level relative quantification, and that the new generation of pseudo-alignment-based software performs as well as established methods, at a fraction of the computing time. We also assess the impact of library type and size on quantification and differential expression analysis. Finally, we have created a R package and a web platform to enable the simple and streamlined application of this resource to the benchmarking of future methods.
引用
收藏
页码:5054 / 5067
页数:14
相关论文
共 40 条
[1]   7q11.23 dosage-dependent dysregulation in human pluripotent stem cells affects transcriptional programs in disease-relevant lineages [J].
Adamo, Antonio ;
Atashpaz, Sina ;
Germain, Pierre-Luc ;
Zanella, Matteo ;
D'Agostino, Giuseppe ;
Albertin, Veronica ;
Chenoweth, Josh ;
Micale, Lucia ;
Fusco, Carmela ;
Unger, Christian ;
Augello, Bartolomeo ;
Palumbo, Orazio ;
Hamilton, Brad ;
Carella, Massimo ;
Donti, Emilio ;
Pruneri, Giancarlo ;
Selicorni, Angelo ;
Biamino, Elisa ;
Prontera, Paolo ;
Mckay, Ronald ;
Merla, Giuseppe ;
Testa, Giuseppe .
NATURE GENETICS, 2015, 47 (02) :132-141
[2]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[3]   HTSeq-a Python']Python framework to work with high-throughput sequencing data [J].
Anders, Simon ;
Pyl, Paul Theodor ;
Huber, Wolfgang .
BIOINFORMATICS, 2015, 31 (02) :166-169
[4]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[5]  
Engström PG, 2013, NAT METHODS, V10, P1185, DOI [10.1038/NMETH.2722, 10.1038/nmeth.2722]
[6]   Polyester: simulating RNA-seq datasets with differential transcript expression [J].
Frazee, Alyssa C. ;
Jaffe, Andrew E. ;
Langmead, Ben ;
Leek, Jeffrey T. .
BIOINFORMATICS, 2015, 31 (17) :2778-2784
[7]   Direct multiplexed measurement of gene expression with color-coded probe pairs [J].
Geiss, Gary K. ;
Bumgarner, Roger E. ;
Birditt, Brian ;
Dahl, Timothy ;
Dowidar, Naeem ;
Dunaway, Dwayne L. ;
Fell, H. Perry ;
Ferree, Sean ;
George, Renee D. ;
Grogan, Tammy ;
James, Jeffrey J. ;
Maysuria, Malini ;
Mitton, Jeffrey D. ;
Oliveri, Paola ;
Osborn, Jennifer L. ;
Peng, Tao ;
Ratcliffe, Amber L. ;
Webster, Philippa J. ;
Davidson, Eric H. ;
Hood, Leroy .
NATURE BIOTECHNOLOGY, 2008, 26 (03) :317-325
[8]   Comparison of Affymetrix GeneChip expression measures [J].
Irizarry, RA ;
Wu, ZJ ;
Jaffee, HA .
BIOINFORMATICS, 2006, 22 (07) :789-794
[9]   Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data [J].
Kanitz, Alexander ;
Gypas, Foivos ;
Gruber, Andreas J. ;
Gruber, Andreas R. ;
Martin, Georges ;
Zavolan, Mihaela .
GENOME BIOLOGY, 2015, 16
[10]  
Kim D, 2015, NAT METHODS, V12, P357, DOI [10.1038/NMETH.3317, 10.1038/nmeth.3317]