StringTie enables improved reconstruction of a transcriptome from RNA-seq reads

被引:9137
作者
Pertea, Mihaela [1 ,2 ]
Pertea, Geo M. [1 ,2 ]
Antonescu, Corina M. [1 ,2 ]
Chang, Tsung-Cheng [3 ,4 ]
Mendell, Joshua T. [3 ,4 ,5 ]
Salzberg, Steven L. [1 ,2 ,6 ,7 ]
机构
[1] Johns Hopkins Univ, Ctr Computat Biol, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, McKusick Nathans Inst Genet Med, Baltimore, MD USA
[3] Univ Texas SW Med Ctr Dallas, Dept Med Biol, Dallas, TX 75390 USA
[4] Univ Texas SW Med Ctr Dallas, Ctr Regenerat Sci & Med, Dallas, TX 75390 USA
[5] Univ Texas SW Med Ctr Dallas, Simmons Canc Ctr, Dallas, TX 75390 USA
[6] Johns Hopkins Univ, Dept Biomed Engn, Baltimore, MD USA
[7] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
基金
美国国家卫生研究院;
关键词
ISOFORM DISCOVERY; QUANTIFICATION; EXPRESSION; ABUNDANCE; REVEALS; ANNOTATION;
D O I
10.1038/nbt.3122
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53% increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20% more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.
引用
收藏
页码:290 / +
页数:8
相关论文
共 34 条
[1]   MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples [J].
Behr, Jonas ;
Kahles, Andre ;
Zhong, Yi ;
Sreedharan, Vipin T. ;
Drewe, Philipp ;
Raetsch, Gunnar .
BIOINFORMATICS, 2013, 29 (20) :2529-2538
[2]   Alternative splicing: New insights from global analyses [J].
Blencowe, Benjamin J. .
CELL, 2006, 126 (01) :37-47
[3]   Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses [J].
Cabili, Moran N. ;
Trapnell, Cole ;
Goff, Loyal ;
Koziol, Magdalena ;
Tazon-Vega, Barbara ;
Regev, Aviv ;
Rinn, John L. .
GENES & DEVELOPMENT, 2011, 25 (18) :1915-1927
[4]  
Dantzig G.B., 1962, Linear Programming and Extensions
[5]   An integrated encyclopedia of DNA elements in the human genome [J].
Dunham, Ian ;
Kundaje, Anshul ;
Aldred, Shelley F. ;
Collins, Patrick J. ;
Davis, CarrieA. ;
Doyle, Francis ;
Epstein, Charles B. ;
Frietze, Seth ;
Harrow, Jennifer ;
Kaul, Rajinder ;
Khatun, Jainab ;
Lajoie, Bryan R. ;
Landt, Stephen G. ;
Lee, Bum-Kyu ;
Pauli, Florencia ;
Rosenbloom, Kate R. ;
Sabo, Peter ;
Safi, Alexias ;
Sanyal, Amartya ;
Shoresh, Noam ;
Simon, Jeremy M. ;
Song, Lingyun ;
Trinklein, Nathan D. ;
Altshuler, Robert C. ;
Birney, Ewan ;
Brown, James B. ;
Cheng, Chao ;
Djebali, Sarah ;
Dong, Xianjun ;
Dunham, Ian ;
Ernst, Jason ;
Furey, Terrence S. ;
Gerstein, Mark ;
Giardine, Belinda ;
Greven, Melissa ;
Hardison, Ross C. ;
Harris, Robert S. ;
Herrero, Javier ;
Hoffman, Michael M. ;
Iyer, Sowmya ;
Kellis, Manolis ;
Khatun, Jainab ;
Kheradpour, Pouya ;
Kundaje, Anshul ;
Lassmann, Timo ;
Li, Qunhua ;
Lin, Xinying ;
Marinov, Georgi K. ;
Merkel, Angelika ;
Mortazavi, Ali .
NATURE, 2012, 489 (7414) :57-74
[6]   Inference of Isoforms from Short Sequence Reads [J].
Feng, Jianxing ;
Li, Wei ;
Jiang, Tao .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2011, 18 (03) :305-321
[7]   Ensembl 2014 [J].
Flicek, Paul ;
Amode, M. Ridwan ;
Barrell, Daniel ;
Beal, Kathryn ;
Billis, Konstantinos ;
Brent, Simon ;
Carvalho-Silva, Denise ;
Clapham, Peter ;
Coates, Guy ;
Fitzgerald, Stephen ;
Gil, Laurent ;
Giron, Carlos Garcia ;
Gordon, Leo ;
Hourlier, Thibaut ;
Hunt, Sarah ;
Johnson, Nathan ;
Juettemann, Thomas ;
Kaehaeri, Andreas K. ;
Keenan, Stephen ;
Kulesha, Eugene ;
Martin, Fergal J. ;
Maurel, Thomas ;
McLaren, William M. ;
Murphy, Daniel N. ;
Nag, Rishi ;
Overduin, Bert ;
Pignatelli, Miguel ;
Pritchard, Bethan ;
Pritchard, Emily ;
Riat, Harpreet S. ;
Ruffier, Magali ;
Sheppard, Daniel ;
Taylor, Kieron ;
Thormann, Anja ;
Trevanion, Stephen J. ;
Vullo, Alessandro ;
Wilder, Steven P. ;
Wilson, Mark ;
Zadissa, Amonida ;
Aken, Bronwen L. ;
Birney, Ewan ;
Cunningham, Fiona ;
Harrow, Jennifer ;
Herrero, Javier ;
Hubbard, Tim J. P. ;
Kinsella, Rhoda ;
Muffato, Matthieu ;
Parker, Anne ;
Spudich, Giulietta ;
Yates, Andy .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D749-D755
[8]  
Ford L. R., 1962, Flows in networks
[9]  
Garber M, 2011, NAT METHODS, V8, P469, DOI [10.1038/NMETH.1613, 10.1038/nmeth.1613]
[10]   A NEW APPROACH TO THE MAXIMUM-FLOW PROBLEM [J].
GOLDBERG, AV ;
TARJAN, RE .
JOURNAL OF THE ACM, 1988, 35 (04) :921-940