PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution

被引:25
作者
Hu, Yu [1 ]
Liu, Yichuan [1 ]
Mao, Xianyun [1 ]
Jia, Cheng [1 ]
Ferguson, Jane F. [2 ]
Xue, Chenyi [2 ]
Reilly, Muredach P. [2 ]
Li, Hongzhe [1 ]
Li, Mingyao [1 ]
机构
[1] Univ Penn, Dept Biostat & Epidemiol, Perelman Sch Med, Philadelphia, PA 19104 USA
[2] Univ Penn, Cardiovasc Inst, Perelman Sch Med, Philadelphia, PA 19104 USA
基金
美国国家卫生研究院;
关键词
DIFFERENTIAL EXPRESSION; REPRODUCIBILITY; TRANSCRIPTOME; INFERENCE;
D O I
10.1093/nar/gkt1304
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Correctly estimating isoform-specific gene expression is important for understanding complicated biological mechanisms and for mapping disease susceptibility genes. However, estimating isoform-specific gene expression is challenging because various biases present in RNA-Seq (RNA sequencing) data complicate the analysis, and if not appropriately corrected, can affect isoform expression estimation and downstream analysis. In this article, we present PennSeq, a statistical method that allows each isoform to have its own non-uniform read distribution. Instead of making parametric assumptions, we give adequate weight to the underlying data by the use of a non-parametric approach. Our rationale is that regardless what factors lead to non-uniformity, whether it is due to hexamer priming bias, local sequence bias, positional bias, RNA degradation, mapping bias or other unknown reasons, the probability that a fragment is sampled from a particular region will be reflected in the aligned data. This empirical approach thus maximally reflects the true underlying non-uniform read distribution. We evaluate the performance of PennSeq using both simulated data with known ground truth, and using two real Illumina RNA-Seq data sets including one with quantitative real time polymerase chain reaction measurements. Our results indicate superior performance of PennSeq over existing methods, particularly for isoforms demonstrating severe non-uniformity. PennSeq is freely available for download at http://sourceforge.net/projects/pennseq.
引用
收藏
页数:14
相关论文
共 31 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[3]   Race and gender variation in response to evoked inflammation [J].
Ferguson, Jane F. ;
Patel, Parth N. ;
Shah, Rhia Y. ;
Mulvey, Claire K. ;
Gadi, Ram ;
Nijjar, Prabhjot S. ;
Usman, Haris M. ;
Mehta, Nehal N. ;
Shah, Rachana ;
Master, Stephen R. ;
Propert, Kathleen J. ;
Reilly, Muredach P. .
JOURNAL OF TRANSLATIONAL MEDICINE, 2013, 11
[4]   Modelling and simulating generic RNA-Seq experiments with the flux simulator [J].
Griebel, Thasso ;
Zacher, Benedikt ;
Ribeca, Paolo ;
Raineri, Emanuele ;
Lacroix, Vincent ;
Guigo, Roderic ;
Sammeth, Michael .
NUCLEIC ACIDS RESEARCH, 2012, 40 (20) :10073-10083
[5]   Biases in Illumina transcriptome sequencing caused by random hexamer priming [J].
Hansen, Kasper D. ;
Brenner, Steven E. ;
Dudoit, Sandrine .
NUCLEIC ACIDS RESEARCH, 2010, 38 (12) :e131
[6]   Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq [J].
Hu, Ming ;
Zhu, Yu ;
Taylor, Jeremy M. G. ;
Liu, Jun S. ;
Qin, Zhaohui S. .
BIOINFORMATICS, 2012, 28 (01) :63-68
[7]   Statistical inferences for isoform expression in RNA-Seq [J].
Jiang, Hui ;
Wong, Wing Hung .
BIOINFORMATICS, 2009, 25 (08) :1026-1032
[8]   Transcriptome and genome sequencing uncovers functional variation in humans [J].
Lappalainen, Tuuli ;
Sammeth, Michael ;
Friedlaender, Marc R. ;
't Hoen, Peter A. C. ;
Monlong, Jean ;
Rivas, Manuel A. ;
Gonzalez-Porta, Mar ;
Kurbatova, Natalja ;
Griebel, Thasso ;
Ferreira, Pedro G. ;
Barann, Matthias ;
Wieland, Thomas ;
Greger, Liliana ;
van Iterson, Maarten ;
Almloef, Jonas ;
Ribeca, Paolo ;
Pulyakhina, Irina ;
Esser, Daniela ;
Giger, Thomas ;
Tikhonov, Andrew ;
Sultan, Marc ;
Bertier, Gabrielle ;
MacArthur, Daniel G. ;
Lek, Monkol ;
Lizano, Esther ;
Buermans, Henk P. J. ;
Padioleau, Ismael ;
Schwarzmayr, Thomas ;
Karlberg, Olof ;
Ongen, Halit ;
Kilpinen, Helena ;
Beltran, Sergi ;
Gut, Marta ;
Kahlem, Katja ;
Amstislavskiy, Vyacheslav ;
Stegle, Oliver ;
Pirinen, Matti ;
Montgomery, Stephen B. ;
Donnelly, Peter ;
McCarthy, Mark I. ;
Flicek, Paul ;
Strom, Tim M. ;
Lehrach, Hans ;
Schreiber, Stefan ;
Sudbrak, Ralf ;
Carracedo, Angel ;
Antonarakis, Stylianos E. ;
Haesler, Robert ;
Syvaenen, Ann-Christine ;
Van Ommen, Gert-Jan .
NATURE, 2013, 501 (7468) :506-511
[9]   EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments [J].
Leng, Ning ;
Dawson, John A. ;
Thomson, James A. ;
Ruotti, Victor ;
Rissman, Anna I. ;
Smits, Bart M. G. ;
Haag, Jill D. ;
Gould, Michael N. ;
Stewart, Ron M. ;
Kendziorski, Christina .
BIOINFORMATICS, 2013, 29 (08) :1035-1043
[10]   RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome [J].
Li, Bo ;
Dewey, Colin N. .
BMC BIOINFORMATICS, 2011, 12