Grinder: a versatile amplicon and shotgun sequence simulator

被引:134
作者
Angly, Florent E. [1 ]
Willner, Dana [1 ,2 ]
Rohwer, Forest [3 ]
Hugenholtz, Philip [1 ,4 ]
Tyson, Gene W. [1 ,5 ]
机构
[1] Australian Ctr Ecogen, Sch Chem & Mol Biosci, Sydney, NSW, Australia
[2] Univ Queensland, Diamantina Inst, St Lucia, Qld 4072, Australia
[3] San Diego State Univ, Dept Biol, San Diego, CA 92182 USA
[4] Univ Queensland, Inst Mol Biosci, St Lucia, Qld 4072, Australia
[5] Univ Queensland, Adv Water Management Ctr, St Lucia, Qld 4072, Australia
基金
澳大利亚研究理事会;
关键词
454 PYROSEQUENCING DATA; RIBOSOMAL-RNA GENE; HIGH-THROUGHPUT; MICROBIAL DIVERSITY; RARE BIOSPHERE; DATA SETS; DATABASE; BIAS; METAGENOMICS; GENOMES;
D O I
10.1093/nar/gks251
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We introduce Grinder (http://sourceforge.net/projects/biogrinder/), an open-source bioinformatic tool to simulate amplicon and shotgun (genomic, metagenomic, transcriptomic and metatranscriptomic) datasets from reference sequences. This is the first tool to simulate amplicon datasets (e.g. 16S rRNA) widely used by microbial ecologists. Grinder can create sequence libraries with a specific community structure, alpha and beta diversities and experimental biases (e.g. chimeras, gene copy number variation) for commonly used sequencing platforms. This versatility allows the creation of simple to complex read datasets necessary for hypothesis testing when developing bioinformatic software, benchmarking existing tools or designing sequence-based experiments. Grinder is particularly useful for simulating clinical or environmental microbial communities and complements the use of in vitro mock communities.
引用
收藏
页数:8
相关论文
共 52 条
  • [1] Galaxy CloudMan: delivering cloud compute clusters
    Afgan, Enis
    Baker, Dannon
    Coraor, Nate
    Chapman, Brad
    Nekrutenko, Anton
    Taylor, James
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [2] The marine viromes of four oceanic regions
    Angly, Florent E.
    Felts, Ben
    Breitbart, Mya
    Salamon, Peter
    Edwards, Robert A.
    Carlson, Craig
    Chan, Amy M.
    Haynes, Matthew
    Kelley, Scott
    Liu, Hong
    Mahaffy, Joseph M.
    Mueller, Jennifer E.
    Nulton, Jim
    Olson, Robert
    Parsons, Rachel
    Rayhawk, Steve
    Suttle, Curtis A.
    Rohwer, Forest
    [J]. PLOS BIOLOGY, 2006, 4 (11) : 2121 - 2131
  • [3] The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes
    Angly, Florent E.
    Willner, Dana
    Prieto-Davo, Alejandra
    Edwards, Robert A.
    Schmieder, Robert
    Vega-Thurber, Rebecca
    Antonopoulos, Dionysios A.
    Barott, Katie
    Cottrell, Matthew T.
    Desnues, Christelle
    Dinsdale, Elizabeth A.
    Furlan, Mike
    Haynes, Matthew
    Henn, Matthew R.
    Hu, Yongfei
    Kirchman, David L.
    McDole, Tracey
    McPherson, John D.
    Meyer, Folker
    Miller, R. Michael
    Mundt, Egbert
    Naviaux, Robert K.
    Rodriguez-Mueller, Beltran
    Stevens, Rick
    Wegley, Linda
    Zhang, Lixin
    Zhu, Baoli
    Rohwer, Forest
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (12)
  • [4] [Anonymous], R LANG ENV STAT COMP
  • [5] Systematic exploration of error sources in pyrosequencing flowgram data
    Balzer, Susanne
    Malde, Ketil
    Jonassen, Inge
    [J]. BIOINFORMATICS, 2011, 27 (13) : I304 - I309
  • [6] Characteristics of 454 pyrosequencing data-enabling realistic simulation with flowsim
    Balzer, Susanne
    Malde, Ketil
    Lanzen, Anders
    Sharma, Animesh
    Jonassen, Inge
    [J]. BIOINFORMATICS, 2010, 26 (18) : i420 - i425
  • [7] PseudoMLSA: a database for multigenic sequence analysis of Pseudomonas species
    Bennasar, Antoni
    Mulet, Magdalena
    Lalucat, Jorge
    Garcia-Valdes, Elena
    [J]. BMC MICROBIOLOGY, 2010, 10
  • [8] Aligning short reads to reference alignments and trees
    Berger, Simon A.
    Stamatakis, Alexandros
    [J]. BIOINFORMATICS, 2011, 27 (15) : 2068 - 2075
  • [9] Average genome size: a potential source of bias in comparative metagenomics
    Beszteri, Bank
    Temperton, Ben
    Frickenhaus, Stephan
    Giovannoni, Stephen J.
    [J]. ISME JOURNAL, 2010, 4 (08) : 1075 - 1077
  • [10] QIIME allows analysis of high-throughput community sequencing data
    Caporaso, J. Gregory
    Kuczynski, Justin
    Stombaugh, Jesse
    Bittinger, Kyle
    Bushman, Frederic D.
    Costello, Elizabeth K.
    Fierer, Noah
    Pena, Antonio Gonzalez
    Goodrich, Julia K.
    Gordon, Jeffrey I.
    Huttley, Gavin A.
    Kelley, Scott T.
    Knights, Dan
    Koenig, Jeremy E.
    Ley, Ruth E.
    Lozupone, Catherine A.
    McDonald, Daniel
    Muegge, Brian D.
    Pirrung, Meg
    Reeder, Jens
    Sevinsky, Joel R.
    Tumbaugh, Peter J.
    Walters, William A.
    Widmann, Jeremy
    Yatsunenko, Tanya
    Zaneveld, Jesse
    Knight, Rob
    [J]. NATURE METHODS, 2010, 7 (05) : 335 - 336