A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly

被引:69
作者
Francis, Warren R. [1 ,2 ]
Christianson, Lynne M. [1 ]
Kiko, Rainer [3 ]
Powers, Meghan L. [1 ,2 ]
Shaner, Nathan C. [4 ]
Haddock, Steven H. D. [1 ]
机构
[1] Monterey Bay Aquarium Res Inst, Moss Landing, CA 95039 USA
[2] Univ Calif Santa Cruz, Dept Ocean Sci, Santa Cruz, CA 95064 USA
[3] GEOMAR, Helmholtz Ctr Ocean Res Kiel, D-24105 Kiel, Germany
[4] Scintillon Inst, San Diego, CA 92121 USA
来源
BMC GENOMICS | 2013年 / 14卷
关键词
RNA-SEQ DATA; DIFFERENTIAL EXPRESSION; GENES; NORMALIZATION;
D O I
10.1186/1471-2164-14-167
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The lack of genomic resources can present challenges for studies of non-model organisms. Transcriptome sequencing offers an attractive method to gather information about genes and gene expression without the need for a reference genome. However, it is unclear what sequencing depth is adequate to assemble the transcriptome de novo for these purposes. Results: We assembled transcriptomes of animals from six different phyla (Annelids, Arthropods, Chordates, Cnidarians, Ctenophores, and Molluscs) at regular increments of reads using Velvet/Oases and Trinity to determine how read count affects the assembly. This included an assembly of mouse heart reads because we could compare those against the reference genome that is available. We found qualitative differences in the assemblies of whole-animals versus tissues. With increasing reads, whole-animal assemblies show rapid increase of transcripts and discovery of conserved genes, while single-tissue assemblies show a slower discovery of conserved genes though the assembled transcripts were often longer. A deeper examination of the mouse assemblies shows that with more reads, assembly errors become more frequent but such errors can be mitigated with more stringent assembly parameters. Conclusions: These assembly trends suggest that representative assemblies are generated with as few as 20 million reads for tissue samples and 30 million reads for whole-animals for RNA-level coverage. These depths provide a good balance between coverage and noise. Beyond 60 million reads, the discovery of new genes is low and sequencing errors of highly-expressed genes are likely to accumulate. Finally, siphonophores (polymorphic Cnidarians) are an exception and possibly require alternate assembly strategies.
引用
收藏
页码:1 / 12
页数:11
相关论文
共 28 条
  • [1] [Anonymous], BIOINF ADV ACCESS
  • [2] De novo assembly of Euphorbia fischeriana root transcriptome identifies prostratin pathway related genes
    Barrero, Roberto A.
    Chapman, Brett
    Yang, Yanfang
    Moolhuijzen, Paula
    Keeble-Gagnere, Gabriel
    Zhang, Nan
    Tang, Qi
    Bellgard, Matthew I.
    Qiu, Deyou
    [J]. BMC GENOMICS, 2011, 12
  • [3] Current-generation high-throughput sequencing: deepening insights into mammalian transcriptomes
    Blencowe, Benjamin J.
    Ahmad, Sidrah
    Lee, Leo J.
    [J]. GENES & DEVELOPMENT, 2009, 23 (12) : 1379 - 1386
  • [4] Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
    Bullard, James H.
    Purdom, Elizabeth
    Hansen, Kasper D.
    Dudoit, Sandrine
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [5] BLAST plus : architecture and applications
    Camacho, Christiam
    Coulouris, George
    Avagyan, Vahram
    Ma, Ning
    Papadopoulos, Jason
    Bealer, Kevin
    Madden, Thomas L.
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [6] Stem cell transcriptome profiling via massive-scale mRNA sequencing
    Cloonan, Nicole
    Forrest, Alistair R. R.
    Kolle, Gabriel
    Gardiner, Brooke B. A.
    Faulkner, Geoffrey J.
    Brown, Mellissa K.
    Taylor, Darrin F.
    Steptoe, Anita L.
    Wani, Shivangi
    Bethel, Graeme
    Robertson, Alan J.
    Perkins, Andrew C.
    Bruce, Stephen J.
    Lee, Clarence C.
    Ranade, Swati S.
    Peckham, Heather E.
    Manning, Jonathan M.
    McKernan, Kevin J.
    Grimmond, Sean M.
    [J]. NATURE METHODS, 2008, 5 (07) : 613 - 619
  • [7] De Novo Transcriptome Sequencing in Anopheles funestus Using Illumina RNA-Seq Technology
    Crawford, Jacob E.
    Guelbeogo, Wamdaogo M.
    Sanou, Antoine
    Traore, Alphonse
    Vernick, Kenneth D.
    Sagnon, N'Fale
    Lazzaro, Brian P.
    [J]. PLOS ONE, 2010, 5 (12):
  • [8] Siphonophores
    Dunn, Casey
    [J]. CURRENT BIOLOGY, 2009, 19 (06) : R233 - R234
  • [9] Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance
    Feldmeyer, Barbara
    Wheat, Christopher W.
    Krezdorn, Nicolas
    Rotter, Bjoern
    Pfenninger, Markus
    [J]. BMC GENOMICS, 2011, 12
  • [10] De Novo Assembly of Chickpea Transcriptome Using Short Reads for Gene Discovery and Marker Identification
    Garg, Rohini
    Patel, Ravi K.
    Tyagi, Akhilesh K.
    Jain, Mukesh
    [J]. DNA RESEARCH, 2011, 18 (01) : 53 - 63