Prediction and Quantification of Splice Events from RNA-Seq Data

被引:98
作者
Goldstein, Leonard D. [1 ,2 ]
Cao, Yi [1 ]
Pau, Gregoire [1 ]
Lawrence, Michael [1 ]
Wu, Thomas D. [1 ]
Seshagiri, Somasekar [2 ]
Gentleman, Robert [1 ,3 ]
机构
[1] Genentech Inc, Dept Bioinformat & Computat Biol, San Francisco, CA 94080 USA
[2] Genentech Inc, Dept Mol Biol, San Francisco, CA 94080 USA
[3] 23andMe Inc, Mountain View, CA USA
关键词
TRANSCRIPTOME; ALIGNMENT; BROWSER;
D O I
10.1371/journal.pone.0156132
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Analysis of splice variants from short read RNA-seq data remains a challenging problem. Here we present a novel method for the genome-guided prediction and quantification of splice events from RNA-seq data, which enables the analysis of unannotated and complex splice events. Splice junctions and exons are predicted from reads mapped to a reference genome and are assembled into a genome-wide splice graph. Splice events are identified recursively from the graph and are quantified locally based on reads extending across the start or end of each splice variant. We assess prediction accuracy based on simulated and real RNA-seq data, and illustrate how different read aligners (GSNAP, HISAT2, STAR, TopHat2) affect prediction results. We validate our approach for quantification based on simulated data, and compare local estimates of relative splice variant usage with those from other methods (MISO, Cufflinks) based on simulated and real RNA-seq data. In a proof-of-concept study of splice variants in 16 normal human tissues (Illumina Body Map 2.0) we identify 249 internal exons that belong to known genes but are not related to annotated exons. Using independent RNA samples from 14 matched normal human tissues, we validate 9/9 of these exons by RT-PCR and 216/249 by paired-end RNA-seq (2 x 250 bp). These results indicate that de novo prediction of splice variants remains beneficial even in well-studied systems. An implementation of our method is freely available as an R/Bioconductor package SGSeq.
引用
收藏
页数:18
相关论文
共 33 条
[1]  
Alamancos GP, 2014, METHODS MOL BIOL, V1126, P357, DOI 10.1007/978-1-62703-980-2_26
[2]   Conservation of an RNA regulatory map between Drosophila and mammals [J].
Brooks, Angela N. ;
Yang, Li ;
Duff, Michael O. ;
Hansen, Kasper D. ;
Park, Jung W. ;
Dudoit, Sandrine ;
Brenner, Steven E. ;
Graveley, Brenton R. .
GENOME RESEARCH, 2011, 21 (02) :193-202
[3]  
Csardi G., 2006, InterJournal: Complex Systems, V1965
[4]   Landscape of transcription in human cells [J].
Djebali, Sarah ;
Davis, Carrie A. ;
Merkel, Angelika ;
Dobin, Alex ;
Lassmann, Timo ;
Mortazavi, Ali ;
Tanzer, Andrea ;
Lagarde, Julien ;
Lin, Wei ;
Schlesinger, Felix ;
Xue, Chenghai ;
Marinov, Georgi K. ;
Khatun, Jainab ;
Williams, Brian A. ;
Zaleski, Chris ;
Rozowsky, Joel ;
Roeder, Maik ;
Kokocinski, Felix ;
Abdelhamid, Rehab F. ;
Alioto, Tyler ;
Antoshechkin, Igor ;
Baer, Michael T. ;
Bar, Nadav S. ;
Batut, Philippe ;
Bell, Kimberly ;
Bell, Ian ;
Chakrabortty, Sudipto ;
Chen, Xian ;
Chrast, Jacqueline ;
Curado, Joao ;
Derrien, Thomas ;
Drenkow, Jorg ;
Dumais, Erica ;
Dumais, Jacqueline ;
Duttagupta, Radha ;
Falconnet, Emilie ;
Fastuca, Meagan ;
Fejes-Toth, Kata ;
Ferreira, Pedro ;
Foissac, Sylvain ;
Fullwood, Melissa J. ;
Gao, Hui ;
Gonzalez, David ;
Gordon, Assaf ;
Gunawardena, Harsha ;
Howald, Cedric ;
Jha, Sonali ;
Johnson, Rory ;
Kapranov, Philipp ;
King, Brandon .
NATURE, 2012, 489 (7414) :101-108
[5]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[6]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[7]  
Engström PG, 2013, NAT METHODS, V10, P1185, DOI [10.1038/NMETH.2722, 10.1038/nmeth.2722]
[8]   Pfam: the protein families database [J].
Finn, Robert D. ;
Bateman, Alex ;
Clements, Jody ;
Coggill, Penelope ;
Eberhardt, Ruth Y. ;
Eddy, Sean R. ;
Heger, Andreas ;
Hetherington, Kirstie ;
Holm, Liisa ;
Mistry, Jaina ;
Sonnhammer, Erik L. L. ;
Tate, John ;
Punta, Marco .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D222-D230
[9]  
Florea Liliana, 2013, F1000Res, V2, P188, DOI 10.12688/f1000research.2-188.v1
[10]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)