PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution

被引:25
作者
Hu, Yu [1 ]
Liu, Yichuan [1 ]
Mao, Xianyun [1 ]
Jia, Cheng [1 ]
Ferguson, Jane F. [2 ]
Xue, Chenyi [2 ]
Reilly, Muredach P. [2 ]
Li, Hongzhe [1 ]
Li, Mingyao [1 ]
机构
[1] Univ Penn, Dept Biostat & Epidemiol, Perelman Sch Med, Philadelphia, PA 19104 USA
[2] Univ Penn, Cardiovasc Inst, Perelman Sch Med, Philadelphia, PA 19104 USA
基金
美国国家卫生研究院;
关键词
DIFFERENTIAL EXPRESSION; REPRODUCIBILITY; TRANSCRIPTOME; INFERENCE;
D O I
10.1093/nar/gkt1304
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Correctly estimating isoform-specific gene expression is important for understanding complicated biological mechanisms and for mapping disease susceptibility genes. However, estimating isoform-specific gene expression is challenging because various biases present in RNA-Seq (RNA sequencing) data complicate the analysis, and if not appropriately corrected, can affect isoform expression estimation and downstream analysis. In this article, we present PennSeq, a statistical method that allows each isoform to have its own non-uniform read distribution. Instead of making parametric assumptions, we give adequate weight to the underlying data by the use of a non-parametric approach. Our rationale is that regardless what factors lead to non-uniformity, whether it is due to hexamer priming bias, local sequence bias, positional bias, RNA degradation, mapping bias or other unknown reasons, the probability that a fragment is sampled from a particular region will be reflected in the aligned data. This empirical approach thus maximally reflects the true underlying non-uniform read distribution. We evaluate the performance of PennSeq using both simulated data with known ground truth, and using two real Illumina RNA-Seq data sets including one with quantitative real time polymerase chain reaction measurements. Our results indicate superior performance of PennSeq over existing methods, particularly for isoforms demonstrating severe non-uniformity. PennSeq is freely available for download at http://sourceforge.net/projects/pennseq.
引用
收藏
页数:14
相关论文
共 31 条
[11]   RNA-Seq gene expression estimation with read mapping uncertainty [J].
Li, Bo ;
Ruotti, Victor ;
Stewart, Ron M. ;
Thomson, James A. ;
Dewey, Colin N. .
BIOINFORMATICS, 2010, 26 (04) :493-500
[12]   Modeling non-uniformity in short-read rates in RNA-Seq data [J].
Li, Jun ;
Jiang, Hui ;
Wong, Wing Hung .
GENOME BIOLOGY, 2010, 11 (05)
[13]   Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads [J].
Li, Wei ;
Jiang, Tao .
BIOINFORMATICS, 2012, 28 (22) :2914-2921
[14]   Evaluating the Impact of Sequencing Depth on Transcriptome Profiling in Human Adipose [J].
Liu, Yichuan ;
Ferguson, Jane F. ;
Xue, Chenyi ;
Silverman, Ian M. ;
Gregory, Brian ;
Reilly, Muredach P. ;
Li, Mingyao .
PLOS ONE, 2013, 8 (06)
[15]   RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays [J].
Marioni, John C. ;
Mason, Christopher E. ;
Mane, Shrikant M. ;
Stephens, Matthew ;
Gilad, Yoav .
GENOME RESEARCH, 2008, 18 (09) :1509-1517
[16]   iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data [J].
Mezlini, Aziz M. ;
Smith, Eric J. M. ;
Fiume, Marc ;
Buske, Orion ;
Savich, Gleb L. ;
Shah, Sohrab ;
Aparicio, Sam ;
Chiang, Derek Y. ;
Goldenberg, Anna ;
Brudno, Michael .
GENOME RESEARCH, 2013, 23 (03) :519-529
[17]   Estimation of alternative splicing isoform frequencies from RNA-Seq data [J].
Nicolae, Marius ;
Mangul, Serghei ;
Mandoiu, Ion I. ;
Zelikovsky, Alex .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2011, 6
[18]   Identification of novel transcripts in annotated genomes using RNA-Seq [J].
Roberts, Adam ;
Pimentel, Harold ;
Trapnell, Cole ;
Pachter, Lior .
BIOINFORMATICS, 2011, 27 (17) :2325-2329
[19]   Improving RNA-Seq expression estimates by correcting for fragment bias [J].
Roberts, Adam ;
Trapnell, Cole ;
Donaghey, Julie ;
Rinn, John L. ;
Pachter, Lior .
GENOME BIOLOGY, 2011, 12 (03)
[20]   edgeR: a Bioconductor package for differential expression analysis of digital gene expression data [J].
Robinson, Mark D. ;
McCarthy, Davis J. ;
Smyth, Gordon K. .
BIOINFORMATICS, 2010, 26 (01) :139-140