Benchmarking of RNA-sequencing analysis workflows using wholetranscriptome RT-qPCR expression data

被引:280
作者
Everaert, Celine [1 ,2 ,3 ]
Luypaert, Manuel [4 ]
Maag, Jesper L. V. [5 ]
Cheng, Quek Xiu [5 ]
Dinger, Marcel E. [5 ]
Hellemans, Jan [4 ]
Mestdagh, Pieter [1 ,2 ,3 ]
机构
[1] Univ Ghent, Ctr Med Genet, Ghent, Belgium
[2] Univ Ghent, Canc Res Inst Ghent, Ghent, Belgium
[3] Univ Ghent, Bioinformat Inst Ghent N2N, Ghent, Belgium
[4] Biogazelle, Ghent, Belgium
[5] Kinghorn Canc Ctr, Sydney, NSW, Australia
关键词
QUANTIFICATION; TOPHAT; GENE;
D O I
10.1038/s41598-017-01617-3
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
RNA-sequencing has become the gold standard for whole-transcriptome gene expression quantification. Multiple algorithms have been developed to derive gene counts from sequencing reads. While a number of benchmarking studies have been conducted, the question remains how individual methods perform at accurately quantifying gene expression levels from RNA-sequencing reads. We performed an independent benchmarking study using RNA-sequencing data from the well established MAQCA and MAQCB reference samples. RNA-sequencing reads were processed using five workflows (Tophat-HTSeq, Tophat-Cufflinks, STAR-HTSeq, Kallisto and Salmon) and resulting gene expression measurements were compared to expression data generated by wet-lab validated qPCR assays for all protein coding genes. All methods showed high gene expression correlations with qPCR data. When comparing gene expression fold changes between MAQCA and MAQCB samples, about 85% of the genes showed consistent results between RNA-sequencing and qPCR data. Of note, each method revealed a small but specific gene set with inconsistent expression measurements. A significant proportion of these method-specific inconsistent genes were reproducibly identified in independent datasets. These genes were typically smaller, had fewer exons, and were lower expressed compared to genes with consistent expression measurements. We propose that careful validation is warranted when evaluating RNA-seq based expression profiles for this specific gene set.
引用
收藏
页数:11
相关论文
共 23 条
[1]   HTSeq-a Python']Python framework to work with high-throughput sequencing data [J].
Anders, Simon ;
Pyl, Paul Theodor ;
Huber, Wolfgang .
BIOINFORMATICS, 2015, 31 (02) :166-169
[2]   Near-optimal probabilistic RNA-seq quantification (vol 34, pg 525, 2016) [J].
Bray, Nicolas L. ;
Pimentel, Harold ;
Melsted, Pall ;
Pachter, Lior .
NATURE BIOTECHNOLOGY, 2016, 34 (08) :888-888
[3]   In situ analysis of cross-hybridisation on microarrays and the inference of expression correlation [J].
Casneuf, Tineke ;
Van de Peer, Yves ;
Huber, Wolfgang .
BMC BIOINFORMATICS, 2007, 8 (1)
[4]  
Chandramohan R, 2013, IEEE ENG MED BIO, P647, DOI 10.1109/EMBC.2013.6609583
[5]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[6]  
Evans E, 1972, Midwives Chron, V86, P118
[7]  
LOVE MI, 2014, GENOME BIOL, V15, DOI DOI 10.1186/S13059-014-0550-8
[8]  
Mestdagh P, 2014, NAT METHODS, V11, P809, DOI [10.1038/NMETH.3014, 10.1038/nmeth.3014]
[9]   A novel and universal method for microRNA RT-qPCR data normalization [J].
Mestdagh, Pieter ;
Van Vlierberghe, Pieter ;
De Weer, An ;
Muth, Daniel ;
Westermann, Frank ;
Speleman, Frank ;
Vandesompele, Jo .
GENOME BIOLOGY, 2009, 10 (06)
[10]   Mapping and quantifying mammalian transcriptomes by RNA-Seq [J].
Mortazavi, Ali ;
Williams, Brian A. ;
McCue, Kenneth ;
Schaeffer, Lorian ;
Wold, Barbara .
NATURE METHODS, 2008, 5 (07) :621-628