Estimation of alternative splicing isoform frequencies from RNA-Seq data

被引:92
作者
Nicolae, Marius [1 ]
Mangul, Serghei [2 ]
Mandoiu, Ion I. [1 ]
Zelikovsky, Alex [2 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
[2] Georgia State Univ, Dept Comp Sci, Atlanta, GA 30303 USA
来源
ALGORITHMS FOR MOLECULAR BIOLOGY | 2011年 / 6卷
基金
美国国家科学基金会;
关键词
SHORT SEQUENCE READS; EXPRESSION LEVELS; GENE-EXPRESSION; TRANSCRIPTOME; QUANTIFICATION; RECONSTRUCTION; REVEALS; GENOME;
D O I
10.1186/1748-7188-6-9
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging. Results: In this paper we present a novel expectation-maximization algorithm for inference of isoform-and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at http://dna.engr.uconn.edu/software/IsoEM/. Conclusions: Empirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.
引用
收藏
页数:13
相关论文
共 31 条
  • [1] SPACE: an algorithm to predict and quantify alternatively spliced isoforms using microarrays
    Anton, Miguel A.
    Gorostiaga, Dorleta
    Guruceaga, Elizabeth
    Segura, Victor
    Carmona-Saez, Pedro
    Pascual-Montano, Alberto
    Pio, Ruben
    Montuenga, Luis M.
    Rubio, Angel
    [J]. GENOME BIOLOGY, 2008, 9 (02)
  • [2] De novo transcriptome assembly with ABySS
    Birol, Inanc
    Jackman, Shaun D.
    Nielsen, Cydney B.
    Qian, Jenny Q.
    Varhol, Richard
    Stazyk, Greg
    Morin, Ryan D.
    Zhao, Yongjun
    Hirst, Martin
    Schein, Jacqueline E.
    Horsman, Doug E.
    Connors, Joseph M.
    Gascoyne, Randy D.
    Marra, Marco A.
    Jones, Steven J. M.
    [J]. BIOINFORMATICS, 2009, 25 (21) : 2872 - 2877
  • [3] Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays
    Bloom, Joshua S.
    Khan, Zia
    Kruglyak, Leonid
    Singh, Mona
    Caudy, Amy A.
    [J]. BMC GENOMICS, 2009, 10
  • [4] The transcriptional landscape of the mammalian genome
    Carninci, P
    Kasukawa, T
    Katayama, S
    Gough, J
    Frith, MC
    Maeda, N
    Oyama, R
    Ravasi, T
    Lenhard, B
    Wells, C
    Kodzius, R
    Shimokawa, K
    Bajic, VB
    Brenner, SE
    Batalov, S
    Forrest, ARR
    Zavolan, M
    Davis, MJ
    Wilming, LG
    Aidinis, V
    Allen, JE
    Ambesi-Impiombato, X
    Apweiler, R
    Aturaliya, RN
    Bailey, TL
    Bansal, M
    Baxter, L
    Beisel, KW
    Bersano, T
    Bono, H
    Chalk, AM
    Chiu, KP
    Choudhary, V
    Christoffels, A
    Clutterbuck, DR
    Crowe, ML
    Dalla, E
    Dalrymple, BP
    de Bono, B
    Della Gatta, G
    di Bernardo, D
    Down, T
    Engstrom, P
    Fagiolini, M
    Faulkner, G
    Fletcher, CF
    Fukushima, T
    Furuno, M
    Futaki, S
    Gariboldi, M
    [J]. SCIENCE, 2005, 309 (5740) : 1559 - 1563
  • [5] Clarke J, 2009, NAT NANOTECHNOL, V4, P265, DOI [10.1038/NNANO.2009.12, 10.1038/nnano.2009.12]
  • [6] Real-Time DNA Sequencing from Single Polymerase Molecules
    Eid, John
    Fehr, Adrian
    Gray, Jeremy
    Luong, Khai
    Lyle, John
    Otto, Geoff
    Peluso, Paul
    Rank, David
    Baybayan, Primo
    Bettman, Brad
    Bibillo, Arkadiusz
    Bjornson, Keith
    Chaudhuri, Bidhan
    Christians, Frederick
    Cicero, Ronald
    Clark, Sonya
    Dalal, Ravindra
    deWinter, Alex
    Dixon, John
    Foquet, Mathieu
    Gaertner, Alfred
    Hardenbol, Paul
    Heiner, Cheryl
    Hester, Kevin
    Holden, David
    Kearns, Gregory
    Kong, Xiangxu
    Kuse, Ronald
    Lacroix, Yves
    Lin, Steven
    Lundquist, Paul
    Ma, Congcong
    Marks, Patrick
    Maxham, Mark
    Murphy, Devon
    Park, Insil
    Pham, Thang
    Phillips, Michael
    Roy, Joy
    Sebra, Robert
    Shen, Gene
    Sorenson, Jon
    Tomaney, Austin
    Travers, Kevin
    Trulson, Mark
    Vieceli, John
    Wegener, Jeffrey
    Wu, Dawn
    Yang, Alicia
    Zaccarin, Denis
    [J]. SCIENCE, 2009, 323 (5910) : 133 - 138
  • [7] Feng JX, 2010, LECT N BIOINFORMAT, V6044, P138, DOI 10.1007/978-3-642-12683-3_10
  • [8] Griffith M, 2010, NAT METHODS, V7, P843, DOI [10.1038/NMETH.1503, 10.1038/nmeth.1503]
  • [9] Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs
    Guttman, Mitchell
    Garber, Manuel
    Levin, Joshua Z.
    Donaghey, Julie
    Robinson, James
    Adiconis, Xian
    Fan, Lin
    Koziol, Magdalena J.
    Gnirke, Andreas
    Nusbaum, Chad
    Rinn, John L.
    Lander, Eric S.
    Regev, Aviv
    [J]. NATURE BIOTECHNOLOGY, 2010, 28 (05) : 503 - U166
  • [10] Biases in Illumina transcriptome sequencing caused by random hexamer priming
    Hansen, Kasper D.
    Brenner, Steven E.
    Dudoit, Sandrine
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 (12) : e131