A Robust Method for Transcript Quantification with RNA-Seq Data

被引:11
作者
Huang, Yan [1 ]
Hu, Yin [1 ]
Jones, Corbin D. [3 ]
MacLeod, James N. [2 ]
Chiang, Derek Y. [4 ]
Liu, Yufeng [5 ]
Prins, Jan F. [6 ]
Liu, Jinze [1 ]
机构
[1] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA
[2] Univ Kentucky, Dept Vet Sci, Lexington, KY 40506 USA
[3] Univ N Carolina, Dept Biol, Chapel Hill, NC USA
[4] Univ N Carolina, Dept Genet, Chapel Hill, NC USA
[5] Univ N Carolina, Dept Stat & Operat Res, Chapel Hill, NC USA
[6] Univ N Carolina, Dept Comp Sci, Chapel Hill, NC USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
transcriptome; transcript quantification; RNA-seq; EXPRESSION ESTIMATION; ISOFORM EXPRESSION; REVEALS; IMPROVE; TOOL;
D O I
10.1089/cmb.2012.0230
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The advent of high throughput RNA-seq technology allows deep sampling of the transcriptome, making it possible to characterize both the diversity and the abundance of transcript isoforms. Accurate abundance estimation or transcript quantification of isoforms is critical for downstream differential analysis (e. g., healthy vs. diseased cells) but remains a challenging problem for several reasons. First, while various types of algorithms have been developed for abundance estimation, short reads often do not uniquely identify the transcript isoforms from which they were sampled. As a result, the quantification problem may not be identifiable, i.e., lacks a unique transcript solution even if the read maps uniquely to the reference genome. In this article, we develop a general linear model for transcript quantification that leverages reads spanning multiple splice junctions to ameliorate identifiability. Second, RNA-seq reads sampled from the transcriptome exhibit unknown position-specific and sequence-specific biases. We extend our method to simultaneously learn bias parameters during transcript quantification to improve accuracy. Third, transcript quantification is often provided with a candidate set of isoforms, not all of which are likely to be significantly expressed in a given tissue type or condition. By resolving the linear system with LASSO, our approach can infer an accurate set of dominantly expressed transcripts while existing methods tend to assign positive expression to every candidate isoform. Using simulated RNA-seq datasets, our method demonstrated better quantification accuracy and the inference of dominant set of transcripts than existing methods. The application of our method on real data experimentally demonstrated that transcript quantification is effective for differential analysis of transcriptomes.
引用
收藏
页码:167 / 187
页数:21
相关论文
共 35 条
[1]   Algorithms for variable length Markov chain modeling [J].
Bejerano, G .
BIOINFORMATICS, 2004, 20 (05) :788-U729
[2]   rQuant.web: a tool for RNA-Seq-based transcript quantitation [J].
Bohnert, Regina ;
Raetsch, Gunnar .
NUCLEIC ACIDS RESEARCH, 2010, 38 :W348-W351
[3]   High-throughput quantification of splicing isoforms [J].
Brosseau, Jean-Philippe ;
Lucier, Jean-Francois ;
Lapointe, Elvy ;
Durand, Mathieu ;
Gendron, Daniel ;
Gervais-Bird, Julien ;
Tremblay, Karine ;
Perreault, Jean-Pierre ;
Abou Elela, Sherif .
RNA, 2010, 16 (02) :442-449
[4]   Inference of Isoforms from Short Sequence Reads [J].
Feng, Jianxing ;
Li, Wei ;
Jiang, Tao .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2011, 18 (03) :305-321
[5]   Ensembl 2012 [J].
Flicek, Paul ;
Amode, M. Ridwan ;
Barrell, Daniel ;
Beal, Kathryn ;
Brent, Simon ;
Carvalho-Silva, Denise ;
Clapham, Peter ;
Coates, Guy ;
Fairley, Susan ;
Fitzgerald, Stephen ;
Gil, Laurent ;
Gordon, Leo ;
Hendrix, Maurice ;
Hourlier, Thibaut ;
Johnson, Nathan ;
Kaehaeri, Andreas K. ;
Keefe, Damian ;
Keenan, Stephen ;
Kinsella, Rhoda ;
Komorowska, Monika ;
Koscielny, Gautier ;
Kulesha, Eugene ;
Larsson, Pontus ;
Longden, Ian ;
McLaren, William ;
Muffato, Matthieu ;
Overduin, Bert ;
Pignatelli, Miguel ;
Pritchard, Bethan ;
Riat, Harpreet Singh ;
Ritchie, Graham R. S. ;
Ruffier, Magali ;
Schuster, Michael ;
Sobral, Daniel ;
Tang, Y. Amy ;
Taylor, Kieron ;
Trevanion, Stephen ;
Vandrovcova, Jana ;
White, Simon ;
Wilson, Mark ;
Wilder, Steven P. ;
Aken, Bronwen L. ;
Birney, Ewan ;
Cunningham, Fiona ;
Dunham, Ian ;
Durbin, Richard ;
Fernandez-Suarez, Xose M. ;
Harrow, Jennifer ;
Herrero, Javier ;
Hubbard, Tim J. P. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D84-D90
[6]   The architecture of pre-mRNAs affects mechanisms of splice-site pairing [J].
Fox-Walsh, KL ;
Dou, YM ;
Lam, BJ ;
Hung, SP ;
Baldi, PF ;
Hertel, KJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (45) :16176-16181
[7]   Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs [J].
Guttman, Mitchell ;
Garber, Manuel ;
Levin, Joshua Z. ;
Donaghey, Julie ;
Robinson, James ;
Adiconis, Xian ;
Fan, Lin ;
Koziol, Magdalena J. ;
Gnirke, Andreas ;
Nusbaum, Chad ;
Rinn, John L. ;
Lander, Eric S. ;
Regev, Aviv .
NATURE BIOTECHNOLOGY, 2010, 28 (05) :503-U166
[8]  
Horn R.A., 2012, Matrix analysis, DOI [10.1017/CBO9780511810817, DOI 10.1017/CBO9780511810817]
[9]   A probabilistic framework for aligning paired-end RNA-seq data [J].
Hu, Yin ;
Wang, Kai ;
He, Xiaping ;
Chiang, Derek Y. ;
Prins, Jan F. ;
Liu, Jinze .
BIOINFORMATICS, 2010, 26 (16) :1950-1957
[10]   Statistical inferences for isoform expression in RNA-Seq [J].
Jiang, Hui ;
Wong, Wing Hung .
BIOINFORMATICS, 2009, 25 (08) :1026-1032