The binning of rnetagenomic contings for physiology of mixed cultures

被引:152
作者
Strous, Marc [1 ,2 ]
Kraft, Beate [1 ]
Bisdorf, Regina [2 ]
Tegetmeyer, Halina E. [1 ,2 ]
机构
[1] Max Planck Inst Marine Microbiol, Microbial Fitness Grp, D-28359 Bremen, Germany
[2] Univ Bielefeld, Ctr Biotechnol, Inst Genome Res & Syst Biol, D-33615 Bielefeld, Germany
基金
欧洲研究理事会;
关键词
metagenomics; binning; tetranucleotide frequencies; interpolated Markov models; METAGENOMIC ANALYSIS; GENOMES; DNA;
D O I
10.3389/fmicb.2012.00410
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
So far, microbial physiology has dedicated itself mainly to pure cultures. In nature, cross feeding and competition are important aspects of microbial physiology and these can only be addressed by studying complete communities such as enrichment cultures. Metagenomic sequencing is a powerful tool to characterize such mixed cultures. In the analysis of metagenomic data, well established algorithms exist for the assembly of short reads into contigs and for the annotation of predicted genes. However, the binning of the assembled contigs or unassembled reads is still a major bottleneck and required to understand how the overall metabolism is partitioned over different community members. Binning consists of the clustering of contigs or reads that apparently originate from the same source population. In the present study eight metagenomic samples from the same habitat, a laboratory enrichment culture, were sequenced. Each sample contained 13-23 Mb of assembled contigs and up to eight abundant populations. Binning was attempted with existing methods but they were found to produce poor results, were slow, dependent on non-standard platforms or produced errors. A new binning procedure was developed based on multivariate statistics of tetranucleotide frequencies combined with the use of interpolated Markov models. Its performance was evaluated by comparison of the results between samples with BLAST and in comparison to existing algorithms for four publicly available metagenomes and one previously published artificial metagenome. The accuracy of the new approach was comparable or higher than existing methods. Further, it was up to a 100 times faster. It was implemented in Java Swing as a complete open source graphical binning application available for download and further development (http://sourceforge.net/projects/metawatt).
引用
收藏
页数:11
相关论文
共 21 条
[1]   Reliability and applications of statistical methods based on oligonucleotide frequencies in bacterial and archaeal genomes [J].
Bohlin, Jon ;
Skjerve, Eystein ;
Ussery, David W. .
BMC GENOMICS, 2008, 9 (1)
[2]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10
[3]  
Chatterji S., 2007, 07083098 ARXIV
[4]   Identifying bacterial genes and endosymbiont DNA with Glimmer [J].
Delcher, Arthur L. ;
Bratke, Kirsten A. ;
Powers, Edwin C. ;
Salzberg, Steven L. .
BIOINFORMATICS, 2007, 23 (06) :673-679
[5]   TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach [J].
Diaz, Naryttza N. ;
Krause, Lutz ;
Goesmann, Alexander ;
Niehaus, Karsten ;
Nattkemper, Tim W. .
BMC BIOINFORMATICS, 2009, 10
[6]   Nitrite-driven anaerobic methane oxidation by oxygenic bacteria [J].
Ettwig, Katharina F. ;
Butler, Margaret K. ;
Le Paslier, Denis ;
Pelletier, Eric ;
Mangenot, Sophie ;
Kuypers, Marcel M. M. ;
Schreiber, Frank ;
Dutilh, Bas E. ;
Zedelius, Johannes ;
de Beer, Dirk ;
Gloerich, Jolein ;
Wessels, Hans J. C. T. ;
van Alen, Theo ;
Luesken, Francisca ;
Wu, Ming L. ;
van de Pas-Schoonen, Katinka T. ;
den Camp, Huub J. M. Op ;
Janssen-Megens, Eva M. ;
Francoijs, Kees-Jan ;
Stunnenberg, Henk ;
Weissenbach, Jean ;
Jetten, Mike S. M. ;
Strous, Marc .
NATURE, 2010, 464 (7288) :543-+
[7]   Integrative analysis of environmental sequences using MEGAN4 [J].
Huson, Daniel H. ;
Mitra, Suparna ;
Ruscheweyh, Hans-Joachim ;
Weber, Nico ;
Schuster, Stephan C. .
GENOME RESEARCH, 2011, 21 (09) :1552-1560
[8]   Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota [J].
Iverson, Vaughn ;
Morris, Robert M. ;
Frazar, Christian D. ;
Berthiaume, Chris T. ;
Morales, Rhonda L. ;
Armbrust, E. Virginia .
SCIENCE, 2012, 335 (6068) :587-590
[9]   Clustering metagenomic sequences with interpolated Markov models [J].
Kelley, David R. ;
Salzberg, Steven L. .
BMC BIOINFORMATICS, 2010, 11
[10]   Unsupervised statistical clustering of environmental shotgun sequences [J].
Kislyuk, Andrey ;
Bhatnagar, Srijak ;
Dushoff, Jonathan ;
Weitz, Joshua S. .
BMC BIOINFORMATICS, 2009, 10 :316