FRAMA: from RNA-seq data to annotated mRNA assemblies

被引:19
作者
Bens, Martin [1 ]
Sahm, Arne [1 ]
Groth, Marco [1 ]
Jahn, Niels [1 ]
Morhart, Michaela [2 ]
Holtze, Susanne [2 ]
Hildebrandt, Thomas B. [2 ]
Platzer, Matthias [1 ]
Szafranski, Karol [1 ]
机构
[1] Leibniz Inst Ageing, Fritz Lipmann Inst, D-07745 Jena, Germany
[2] Leibniz Inst Zoo & Wildlife Res, D-10315 Berlin, Germany
关键词
RNA-seq; Transcriptome assembly; Full-length mRNA; Naked mole-rat; NAKED MOLE-RAT; EXPRESSED SEQUENCE TAGS; WHEAT TRANSCRIPTOME; GENOME; GENE; GENERATION; LONGEVITY; ADAPTATIONS; DIVERGENCE; INSIGHTS;
D O I
10.1186/s12864-015-2349-8
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Advances in second-generation sequencing of RNA made a near-complete characterization of transcriptomes affordable. However, the reconstruction of full-length mRNAs via de novo RNA-seq assembly is still difficult due to the complexity of eukaryote transcriptomes with highly similar paralogs and multiple alternative splice variants. Here, we present FRAMA, a genome-independent annotation tool for de novo mRNA assemblies that addresses several post-assembly tasks, such as reduction of contig redundancy, ortholog assignment, correction of misassembled transcripts, scaffolding of fragmented transcripts and coding sequence identification. Results: We applied FRAMA to assemble and annotate the transcriptome of the naked mole-rat and assess the quality of the obtained compilation of transcripts with the aid of publicy available naked mole-rat gene annotations. Based on a de novo transcriptome assembly (Trinity), FRAMA annotated 21,984 naked mole-rat mRNAs (12,100 full-length CDSs), corresponding to 16,887 genes. The scaffolding of 3488 genes increased the median sequence information 1.27-fold. In total, FRAMA detected and corrected 4774 misassembled genes, which were predominantly caused by fusion of genes. A comparison with three different sources of naked mole-rat transcripts reveals that FRAMA's gene models are better supported by RNA-seq data than any other transcript set. Further, our results demonstrate the competitiveness of FRAMA to state of the art genome-based transcript reconstruction approaches. Conclusion: FRAMA realizes the de novo construction of a low-redundant transcript catalog for eukaryotes, including the extension and refinement of transcripts. Thereby, results delivered by FRAMA provide the basis for comprehensive downstream analyses like gene expression studies or comparative transcriptomics. FRAMA is available at https://github.com/gengit/FRAMA.
引用
收藏
页数:12
相关论文
共 38 条
[1]   Database resources of the National Center for Biotechnology Information [J].
Acland, Abigail ;
Agarwala, Richa ;
Barrett, Tanya ;
Beck, Jeff ;
Benson, Dennis A. ;
Bollin, Colleen ;
Bolton, Evan ;
Bryant, Stephen H. ;
Canese, Kathi ;
Church, Deanna M. ;
Clark, Karen ;
DiCuccio, Michael ;
Dondoshansky, Ilya ;
Federhen, Scott ;
Feolo, Michael ;
Geer, Lewis Y. ;
Gorelenkov, Viatcheslav ;
Hoeppner, Marilu ;
Johnson, Mark ;
Kelly, Christopher ;
Khotomlianski, Viatcheslav ;
Kimchi, Avi ;
Kimelman, Michael ;
Kitts, Paul ;
Krasnov, Sergey ;
Kuznetsov, Anatoliy ;
Landsman, David ;
Lipman, David J. ;
Lu, Zhiyong ;
Madden, Thomas L. ;
Madej, Tom ;
Maglott, Donna R. ;
Marchler-Bauer, Aron ;
Karsch-Mizrachi, Ilene ;
Murphy, Terence ;
Ostell, James ;
O'Sullivan, Christopher ;
Panchenko, Anna ;
Phan, Lon ;
Pruitt, Don Preussm Kim D. ;
Rubinstein, Wendy ;
Sayers, Eric W. ;
Schneider, Valerie ;
Schuler, Gregory D. ;
Sequeira, Edwin ;
Sherry, Stephen T. ;
Shumway, Martin ;
Sirotkin, Karl ;
Siyan, Karanjit ;
Slotta, Douglas .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D7-D17
[2]   Higher-level systematics of rodents and divergence time estimates based on two congruent nuclear genes [J].
Adkins, RM ;
Walton, AH ;
Honeycutt, RL .
MOLECULAR PHYLOGENETICS AND EVOLUTION, 2003, 26 (03) :409-420
[3]   Expanding frontiers in plant transcriptomics in aid of functional genomics and molecular breeding [J].
Agarwal, Pinky ;
Parida, Swarup K. ;
Mahto, Arunima ;
Das, Sweta ;
Mathew, Iny Elizebeth ;
Malik, Naveen ;
Tyagi, Akhilesh K. .
BIOTECHNOLOGY JOURNAL, 2014, 9 (12) :1480-1492
[4]   Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods [J].
Altenhoff, Adrian M. ;
Dessimoz, Christophe .
PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (01)
[5]   Comparative Biology of Aging [J].
Austad, Steven N. .
JOURNALS OF GERONTOLOGY SERIES A-BIOLOGICAL SCIENCES AND MEDICAL SCIENCES, 2009, 64 (02) :199-201
[6]   Normalization and subtraction: Two approaches to facilitate gene discovery [J].
Bonaldo, MDF ;
Lennon, G ;
Soares, MB .
GENOME RESEARCH, 1996, 6 (09) :791-806
[7]   Negligible senescence in the longest living rodent, the naked mole-rat: insights from a successfully aging species [J].
Buffenstein, Rochelle .
JOURNAL OF COMPARATIVE PHYSIOLOGY B-BIOCHEMICAL SYSTEMS AND ENVIRONMENTAL PHYSIOLOGY, 2008, 178 (04) :439-445
[8]   Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data [J].
Duan, Jialei ;
Xia, Chuan ;
Zhao, Guangyao ;
Jia, Jizeng ;
Kong, Xiuying .
BMC GENOMICS, 2012, 13
[9]   Dynamic recruitment of amino acid transporters to the insect/symbiont interface [J].
Duncan, Rebecca P. ;
Husnik, Filip ;
Van Leuven, James T. ;
Gilbert, Donald G. ;
Davalos, Liliana M. ;
McCutcheon, John P. ;
Wilson, Alex C. C. .
MOLECULAR ECOLOGY, 2014, 23 (06) :1608-1623
[10]  
Engström PG, 2013, NAT METHODS, V10, P1185, DOI [10.1038/NMETH.2722, 10.1038/nmeth.2722]