iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data

被引:82
作者
Mezlini, Aziz M. [1 ,2 ,3 ]
Smith, Eric J. M. [1 ]
Fiume, Marc [1 ]
Buske, Orion [1 ]
Savich, Gleb L. [4 ]
Shah, Sohrab [5 ,6 ]
Aparicio, Sam [5 ,6 ]
Chiang, Derek Y. [4 ]
Goldenberg, Anna [1 ,3 ]
Brudno, Michael [1 ,2 ,3 ,7 ]
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 2E4, Canada
[2] Hosp Sick Children, Ctr Computat Med, Toronto, ON M5G 1L7, Canada
[3] Hosp Sick Children, Toronto, ON M5G 1L7, Canada
[4] Univ N Carolina, Dept Genet, Chapel Hill, NC 27599 USA
[5] BC Canc Agcy, Dept Mol Oncol, Vancouver, BC V5Z 1L3, Canada
[6] Univ British Columbia, Dept Pathol, Vancouver, BC V6T 2B5, Canada
[7] Univ Toronto, Donnelly Ctr, Toronto, ON M5S 3E1, Canada
关键词
SAVANT GENOME BROWSER; CANCER; MUTATIONS; TOOL;
D O I
10.1101/gr.142232.112
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput RNA sequencing (RNA-seq) promises to revolutionize our understanding of genes and their role in human disease by characterizing the RNA content of tissues and cells. The realization of this promise, however, is conditional on the development of effective computational methods for the identification and quantification of transcripts from incomplete and noisy data. In this article, we introduce iReckon, a method for simultaneous determination of the isoforms and estimation of their abundances. Our probabilistic approach incorporates multiple biological and technical phenomena, including novel isoforms, intron retention, unspliced pre-mRNA, PCR amplification biases, and multimapped reads. iReckon utilizes regularized expectation-maximization to accurately estimate the abundances of known and novel isoforms. Our results on simulated and real data demonstrate a superior ability to discover novel isoforms with a significantly reduced number of false-positive predictions, and our abundance accuracy prediction outmatches that of other state-of-the-art tools. Furthermore, we have applied iReckon to two cancer transcriptome data sets, a triple-negative breast cancer patient sample and the MCF7 breast cancer cell line, and show that iReckon is able to reconstruct the complex splicing changes that were not previously identified. QT-PCR validations of the isoforms detected in the MCF7 cell line confirmed all of iReckon's predictions and also showed strong agreement (r(2) = 0.94) with the predicted abundances.
引用
收藏
页码:519 / 529
页数:11
相关论文
共 30 条
[1]  
[Anonymous], 2000, FINITE MIXTURE MODEL
[2]  
Aparicio S, 2012, NATURE, V7, P1009
[3]   EXON RECOGNITION IN VERTEBRATE SPLICING [J].
BERGET, SM .
JOURNAL OF BIOLOGICAL CHEMISTRY, 1995, 270 (06) :2411-2414
[4]   rQuant.web: a tool for RNA-Seq-based transcript quantitation [J].
Bohnert, Regina ;
Raetsch, Gunnar .
NUCLEIC ACIDS RESEARCH, 2010, 38 :W348-W351
[5]   Inference of Isoforms from Short Sequence Reads [J].
Feng, Jianxing ;
Li, Wei ;
Jiang, Tao .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2011, 18 (03) :305-321
[6]   Savant Genome Browser 2: visualization and analysis for population-scale genomics [J].
Fiume, Marc ;
Smith, Eric J. M. ;
Brook, Andrew ;
Strbenac, Dario ;
Turner, Brian ;
Mezlini, Aziz M. ;
Robinson, Mark D. ;
Wodak, Shoshana J. ;
Brudno, Michael .
NUCLEIC ACIDS RESEARCH, 2012, 40 (W1) :W615-W621
[7]   Savant: genome browser for high-throughput sequencing data [J].
Fiume, Marc ;
Williams, Vanessa ;
Brook, Andrew ;
Brudno, Michael .
BIOINFORMATICS, 2010, 26 (16) :1938-1944
[8]   A Global View of Cancer-Specific Transcript Variants by Subtractive Transcriptome-Wide Analysis [J].
He, Chunjiang ;
Zhou, Fang ;
Zuo, Zhixiang ;
Cheng, Hanhua ;
Zhou, Rongjia .
PLOS ONE, 2009, 4 (03)
[9]  
Heber Steffen, 2002, Bioinformatics, V18 Suppl 1, pS181
[10]  
Katz Y, 2010, NAT METHODS, V7, P1009, DOI [10.1038/nmeth.1528, 10.1038/NMETH.1528]