Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning

被引:89
作者
Cleary, Brian [1 ,2 ]
Brito, Ilana Lauren [2 ,3 ,4 ]
Huang, Katherine [2 ]
Gevers, Dirk [2 ]
Shea, Terrance [2 ]
Young, Sarah [2 ]
Alm, Eric J. [2 ,3 ,4 ]
机构
[1] MIT, Computat & Syst Biol Program, Cambridge, MA 02139 USA
[2] Broad Inst Harvard & MIT, Cambridge, MA 02142 USA
[3] MIT, Dept Biol Engn, Cambridge, MA 02139 USA
[4] MIT, Ctr Microbiome Informat & Therapeut, Cambridge, MA 02139 USA
关键词
DIVERSITY; ASSEMBLER; ACCURATE; COVERAGE; GENOMES; REVEAL; VELVET;
D O I
10.1038/nbt.3329
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Analyses of metagenomic datasets that are sequenced to a depth of billions or trillions of bases can uncover hundreds of microbial genomes, but naive assembly of these data is computationally intensive, requiring hundreds of gigabytes to terabytes of RAM. We present latent strain analysis (LSA), a scalable, de novo pre-assembly method that separates reads into biologically informed partitions and thereby enables assembly of individual genomes. LSA is implemented with a streaming calculation of unobserved variables that we call eigengenomes. Eigengenomes reflect covariance in the abundance of short, fixed-length sequences, or k-mers. As the abundance of each genome in a sample is reflected in the abundance of each k-mer in that genome, eigengenome analysis can be used to partition reads from different genomes. This partitioning can be done in fixed memory using tens of gigabytes of RAM, which makes assembly and downstream analyses of terabytes of data feasible on commodity hardware. Using LSA, we assemble partial and near-complete genomes of bacterial taxa present at relative abundances as low as 0.00001%. We also show that LSA is sensitive enough to separate reads from several strains of the same species.
引用
收藏
页码:1053 / +
页数:10
相关论文
共 34 条
  • [1] Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes
    Albertsen, Mads
    Hugenholtz, Philip
    Skarshewski, Adam
    Nielsen, Kare L.
    Tyson, Gene W.
    Nielsen, Per H.
    [J]. NATURE BIOTECHNOLOGY, 2013, 31 (06) : 533 - +
  • [2] Alneberg J, 2014, NAT METHODS, V11, P1144, DOI [10.1038/nmeth.3103, 10.1038/NMETH.3103]
  • [3] Enterotypes of the human gut microbiome
    Arumugam, Manimozhiyan
    Raes, Jeroen
    Pelletier, Eric
    Le Paslier, Denis
    Yamada, Takuji
    Mende, Daniel R.
    Fernandes, Gabriel R.
    Tap, Julien
    Bruls, Thomas
    Batto, Jean-Michel
    Bertalan, Marcelo
    Borruel, Natalia
    Casellas, Francesc
    Fernandez, Leyden
    Gautier, Laurent
    Hansen, Torben
    Hattori, Masahira
    Hayashi, Tetsuya
    Kleerebezem, Michiel
    Kurokawa, Ken
    Leclerc, Marion
    Levenez, Florence
    Manichanh, Chaysavanh
    Nielsen, H. Bjorn
    Nielsen, Trine
    Pons, Nicolas
    Poulain, Julie
    Qin, Junjie
    Sicheritz-Ponten, Thomas
    Tims, Sebastian
    Torrents, David
    Ugarte, Edgardo
    Zoetendal, Erwin G.
    Wang, Jun
    Guarner, Francisco
    Pedersen, Oluf
    de Vos, Willem M.
    Brunak, Soren
    Dore, Joel
    Weissenbach, Jean
    Ehrlich, S. Dusko
    Bork, Peer
    [J]. NATURE, 2011, 473 (7346) : 174 - 180
  • [4] Global biogeography of highly diverse protistan communities in soil
    Bates, Scott T.
    Clemente, Jose C.
    Flores, Gilberto E.
    Walters, William Anthony
    Parfrey, Laura Wegener
    Knight, Rob
    Fierer, Noah
    [J]. ISME JOURNAL, 2013, 7 (03) : 652 - 659
  • [5] Ray Meta: scalable de novo metagenome assembly and profiling
    Boisvert, Sebastien
    Raymond, Frederic
    Godzaridis, Elenie
    Laviolette, Francois
    Corbeil, Jacques
    [J]. GENOME BIOLOGY, 2012, 13 (12):
  • [6] The metagenomics of soil
    Daniel, R
    [J]. NATURE REVIEWS MICROBIOLOGY, 2005, 3 (06) : 470 - 478
  • [7] DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
  • [8] 2-9
  • [9] A Genomic Distance Based on MUM Indicates Discontinuity between Most Bacterial Species and Genera
    Deloger, Marc
    El Karoui, Meriem
    Petit, Marie-Agnes
    [J]. JOURNAL OF BACTERIOLOGY, 2009, 191 (01) : 91 - 99
  • [10] Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB
    DeSantis, T. Z.
    Hugenholtz, P.
    Larsen, N.
    Rojas, M.
    Brodie, E. L.
    Keller, K.
    Huber, T.
    Dalevi, D.
    Hu, P.
    Andersen, G. L.
    [J]. APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2006, 72 (07) : 5069 - 5072