Swarm v2: highly-scalable and high-resolution amplicon clustering

被引:401
作者
Mahe, Frederic [1 ]
Rognes, Torbjorn [2 ,3 ]
Quince, Christopher [4 ]
de Vargas, Colomban [5 ,6 ]
Dunthorn, Micah [1 ]
机构
[1] Tech Univ Kaiserslautern, Dept Ecol, Kaiserslautern, Germany
[2] Univ Oslo, Dept Informat, N-0316 Oslo, Norway
[3] Natl Hosp Norway, Oslo Univ Hosp, Dept Microbiol, Oslo, Norway
[4] Univ Warwick, Warwick Med Sch, Warwick, England
[5] CNRS, Stn Biol Roscoff, EPEP Evolut Protistes & Ecosyst Pelag, UMR 7144, Roscoff, France
[6] Univ Paris 06, Sorbonne Univ, Stn Biol Roscoff UMR7144, Roscoff, France
来源
PEERJ | 2015年 / 3卷
基金
英国工程与自然科学研究理事会;
关键词
Environmental diversity; Barcoding; Molecular operational taxonomic units; OPERATIONAL TAXONOMIC UNITS; CILIATE ENVIRONMENTAL DIVERSITY; SEQUENCING DATA; RARE BIOSPHERE; COMMUNITIES; WRINKLES; ACCURATE; REGIONS;
D O I
10.7717/peerj.1420
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (d), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2, which has two important novel features: (1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and (2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks.
引用
收藏
页数:12
相关论文
共 24 条
  • [1] Depicting more accurate pictures of protistan community complexity using pyrosequencing of hypervariable SSU rRNA gene regions
    Behnke, Anke
    Engel, Matthias
    Christen, Richard
    Nebel, Markus
    Klein, Rolf R.
    Stoeck, Thorsten
    [J]. ENVIRONMENTAL MICROBIOLOGY, 2011, 13 (02) : 340 - 349
  • [2] Divergence thresholds and divergent biodiversity estimates: can metabarcoding reliably describe zooplankton communities?
    Brown, Emily A.
    Chain, Frederic J. J.
    Crease, Teresa J.
    MacIsaac, Hugh J.
    Cristescu, Melania E.
    [J]. ECOLOGY AND EVOLUTION, 2015, 5 (11): : 2234 - 2251
  • [3] QIIME allows analysis of high-throughput community sequencing data
    Caporaso, J. Gregory
    Kuczynski, Justin
    Stombaugh, Jesse
    Bittinger, Kyle
    Bushman, Frederic D.
    Costello, Elizabeth K.
    Fierer, Noah
    Pena, Antonio Gonzalez
    Goodrich, Julia K.
    Gordon, Jeffrey I.
    Huttley, Gavin A.
    Kelley, Scott T.
    Knights, Dan
    Koenig, Jeremy E.
    Ley, Ruth E.
    Lozupone, Catherine A.
    McDonald, Daniel
    Muegge, Brian D.
    Pirrung, Meg
    Reeder, Jens
    Sevinsky, Joel R.
    Tumbaugh, Peter J.
    Walters, William A.
    Widmann, Jeremy
    Yatsunenko, Tanya
    Zaneveld, Jesse
    Knight, Rob
    [J]. NATURE METHODS, 2010, 7 (05) : 335 - 336
  • [4] Defining DNA-Based Operational Taxonomic Units for Microbial-Eukaryote Ecology
    Caron, David A.
    Countway, Peter D.
    Savai, Pratik
    Gast, Rebecca J.
    Schnetzer, Astrid
    Moorthi, Stefanie D.
    Dennett, Mark R.
    Moran, Dawn M.
    Jones, Adriane C.
    [J]. APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2009, 75 (18) : 5797 - 5808
  • [5] Eukaryotic plankton diversity in the sunlit ocean
    de Vargas, Colomban
    Audic, Stephane
    Henry, Nicolas
    Decelle, Johan
    Mahe, Frederic
    Logares, Ramiro
    Lara, Enrique
    Berney, Cedric
    Le Bescot, Noan
    Probert, Ian
    Carmichael, Margaux
    Poulain, Julie
    Romac, Sarah
    Colin, Sebastien
    Aury, Jean-Marc
    Bittner, Lucie
    Chaffron, Samuel
    Dunthorn, Micah
    Engelen, Stefan
    Flegontova, Olga
    Guidi, Lionel
    Horak, Ales
    Jaillon, Olivier
    Lima-Mendez, Gipsi
    Lukes, Julius
    Malviya, Shruti
    Morard, Raphael
    Mulot, Matthieu
    Scalco, Eleonora
    Siano, Raffaele
    Vincent, Flora
    Zingone, Adriana
    Dimier, Celine
    Picheral, Marc
    Searson, Sarah
    Kandels-Lewis, Stefanie
    Acinas, Silvia G.
    Bork, Peer
    Bowler, Chris
    Gorsky, Gabriel
    Grimsley, Nigel
    Hingamp, Pascal
    Iudicone, Daniele
    Not, Fabrice
    Ogata, Hiroyuki
    Pesant, Stephane
    Raes, Jeroen
    Sieracki, Michael E.
    Speich, Sabrina
    Stemmann, Lars
    [J]. SCIENCE, 2015, 348 (6237)
  • [6] Comparing the Hyper-Variable V4 and V9 Regions of the Small Subunit rDNA for Assessment of Ciliate Environmental Diversity
    Dunthorn, Micah
    Klier, Julia
    Bunge, John
    Stoeck, Thorsten
    [J]. JOURNAL OF EUKARYOTIC MICROBIOLOGY, 2012, 59 (02) : 185 - 187
  • [7] Search and clustering orders of magnitude faster than BLAST
    Edgar, Robert C.
    [J]. BIOINFORMATICS, 2010, 26 (19) : 2460 - 2461
  • [8] Deep sequencing uncovers protistan plankton diversity in the Portuguese Ria Formosa solar saltern ponds
    Filker, Sabine
    Gimmler, Anna
    Dunthorn, Micah
    Mahe, Frederic
    Stoeck, Thorsten
    [J]. EXTREMOPHILES, 2015, 19 (02) : 283 - 295
  • [9] CD-HIT: accelerated for clustering the next-generation sequencing data
    Fu, Limin
    Niu, Beifang
    Zhu, Zhengwei
    Wu, Sitao
    Li, Weizhong
    [J]. BIOINFORMATICS, 2012, 28 (23) : 3150 - 3152
  • [10] DNACLUST: accurate and efficient clustering of phylogenetic marker genes
    Ghodsi, Mohammadreza
    Liu, Bo
    Pop, Mihai
    [J]. BMC BIOINFORMATICS, 2011, 12