共 31 条
PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution
被引:25
作者:
Hu, Yu
[1
]
Liu, Yichuan
[1
]
Mao, Xianyun
[1
]
Jia, Cheng
[1
]
Ferguson, Jane F.
[2
]
Xue, Chenyi
[2
]
Reilly, Muredach P.
[2
]
Li, Hongzhe
[1
]
Li, Mingyao
[1
]
机构:
[1] Univ Penn, Dept Biostat & Epidemiol, Perelman Sch Med, Philadelphia, PA 19104 USA
[2] Univ Penn, Cardiovasc Inst, Perelman Sch Med, Philadelphia, PA 19104 USA
基金:
美国国家卫生研究院;
关键词:
DIFFERENTIAL EXPRESSION;
REPRODUCIBILITY;
TRANSCRIPTOME;
INFERENCE;
D O I:
10.1093/nar/gkt1304
中图分类号:
Q5 [生物化学];
Q7 [分子生物学];
学科分类号:
071010 ;
081704 ;
摘要:
Correctly estimating isoform-specific gene expression is important for understanding complicated biological mechanisms and for mapping disease susceptibility genes. However, estimating isoform-specific gene expression is challenging because various biases present in RNA-Seq (RNA sequencing) data complicate the analysis, and if not appropriately corrected, can affect isoform expression estimation and downstream analysis. In this article, we present PennSeq, a statistical method that allows each isoform to have its own non-uniform read distribution. Instead of making parametric assumptions, we give adequate weight to the underlying data by the use of a non-parametric approach. Our rationale is that regardless what factors lead to non-uniformity, whether it is due to hexamer priming bias, local sequence bias, positional bias, RNA degradation, mapping bias or other unknown reasons, the probability that a fragment is sampled from a particular region will be reflected in the aligned data. This empirical approach thus maximally reflects the true underlying non-uniform read distribution. We evaluate the performance of PennSeq using both simulated data with known ground truth, and using two real Illumina RNA-Seq data sets including one with quantitative real time polymerase chain reaction measurements. Our results indicate superior performance of PennSeq over existing methods, particularly for isoforms demonstrating severe non-uniformity. PennSeq is freely available for download at http://sourceforge.net/projects/pennseq.
引用
收藏
页数:14
相关论文