PM-Seq: Using Finite Poisson Mixture Models for RNA-Seq Data Analysis and Transcript Expression Level Quantification

被引:5
作者
Wu H. [1 ]
Qin Z. [2 ]
Zhu Y. [1 ]
机构
[1] Purdue University, West Lafayette
[2] Emory University, Atlanta, GA
关键词
Bootstrap; EM algorithm; PM-Seq; Poisson mixture model; RNA-Seq; Transcript expression; Transcriptome profiling;
D O I
10.1007/s12561-012-9070-9
中图分类号
学科分类号
摘要
RNA-Seq has emerged as a powerful technique for transcriptome study. As much as the improved sensitivity and coverage, RNA-Seq also brings about challenges for data analysis. The massive amount of sequence reads data, excessive variability, uncertainties, and bias and noises stemming from multiple sources all make the analysis of RNA-Seq data difficult. Despite much progress, RNA-Seq data analysis still has much room for improvement, especially on the quantification of transcript/gene expression levels. In this article, using finite Poisson mixture models, we propose a two-step approach, called PM-Seq, to characterizing base pair level RNA-Seq data and quantifying transcript/gene expression levels. Finite Poisson mixture models combine the strength of fully parametric models with the flexibility of fully nonparametric models, and are extremely suitable for modeling heterogeneous count data such as RNA-Seq data. In particular, we consider three types of Poisson mixture model and propose to use a BIC-based model selection procedure to adapt the models to individual transcripts. A unified quantification method based on the Poisson mixture models is developed to measure transcript/gene expression levels. The Poisson mixture models and the proposed quantification method were applied to analyze two RNA-Seq data sets and demonstrated excellent performances in comparison with other existing methods. Our approach resulted in better characterization of the data and more accurate measurements of transcript expression levels. We believe that finite Poisson mixture models provide a flexible framework to model RNA-Seq data, and methods developed based on this framework have the potential to become powerful tools for RNA-Seq data analysis. © 2012 International Chinese Statistical Association.
引用
收藏
页码:71 / 87
页数:16
相关论文
共 24 条
[1]  
Aird D., Ross M.G., Chen W.S., Danielsson M., Fennell T., Russ C., Jaffe D.B., Nusbaum C., Gnirke A., Analyzing and minimizing PCR amplification bias in illumina sequencing libraries, Genome Biol, 12, (2011)
[2]  
Bullard J.H., Purdom E., Hansen K.D., Dudoit S., Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments, BMC Bioinform, 11, (2010)
[3]  
Carvalho A.X., Tanner M.A., Modelling nonlinear count time series with local mixtures of Poisson autoregressions, Comput Stat Data Anal, 51, 11, pp. 5266-5294, (2007)
[4]  
Efron B., Tibshirani R.J., An Introduction to the Bootstrap, (1994)
[5]  
Hansen K.D., Brenner S.E., Dudoit S., Biases in illumina transcriptome sequencing caused by random hexamer priming, Nucleic Acids Res, 38, 12, (2010)
[6]  
Hu M., Zhu Y., Taylor J.M., Liu J.S., Qin Z.S., Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq, Bioinformatics, 28, 1, pp. 63-68, (2012)
[7]  
Li J., Jiang H., Wong W., Modeling non-uniformity in short-read rates in RNA-Seq data, Genome Biol, 11, (2010)
[8]  
Lockhart D.J., Dong H., Byrne M.C., Follettie M.T., Gallo M.V., Chee M.S., Mittmann M., Wang C., Kobayashi M., Horton H., Brown E.L., Expression monitoring by hybridization to high-density oligonucleotide arrays, Nat Biotechnol, 14, 13, pp. 1675-1680, (1996)
[9]  
Mamanova L., Andrews R.M., James K.D., Sheridan E.M., Ellis P.D., Langfor C.F., Ost T.W.B., Collins J.E., Turner D.J., FRT-seq: amplification-free, strand-specific transcriptome sequencing, Nat Methods, 7, pp. 130-132, (2010)
[10]  
Mak H.C., Profile of John Storey, Nat Biotechnol, 29, 4, pp. 331-333, (2011)