A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-tuples

被引:106
作者
Wu, Yu-Wei [1 ]
Ye, Yuzhen [1 ]
机构
[1] Indiana Univ, Sch Informat & Comp, Bloomington, IN 47408 USA
关键词
binning; EM algorithm; metagenomics; Poisson distribution; PHYLOGENETIC CLASSIFICATION; MICROBIAL COMMUNITIES; MAXIMUM-LIKELIHOOD; GUT MICROBIOME; RECONSTRUCTION; ENVIRONMENTS; GENOMES; TREE;
D O I
10.1089/cmb.2010.0245
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. Among the computational tools recently developed for metagenomic sequence analysis, binning tools attempt to classify the sequences in a metagenomic dataset into different bins (i.e., species), based on various DNA composition patterns (e. g., the tetramer frequencies) of various genomes. Composition-based binning methods, however, cannot be used to classify very short fragments, because of the substantial variation of DNA composition patterns within a single genome. We developed a novel approach (AbundanceBin) for metagenomics binning by utilizing the different abundances of species living in the same environment. AbundanceBin is an application of the Lander-Waterman model to metagenomics, which is based on the l-tuple content of the reads. AbundanceBin achieved accurate, unsupervised, clustering of metagenomic sequences into different bins, such that the reads classified in a bin belong to species of identical or very similar abundances in the sample. In addition, AbundanceBin gave accurate estimations of species abundances, as well as their genome sizes-two important parameters for characterizing a microbial community. We also show that AbundanceBin performed well when the sequence lengths are very short (e. g., 75 bp) or have sequencing errors. By combining AbundanceBin and a composition-based method (MetaCluster), we can achieve even higher binning accuracy. Supplementary Material is available at www.liebertonline.com/cmb.
引用
收藏
页码:523 / 534
页数:12
相关论文
共 34 条
[1]   Whole-genome re-sequencing [J].
Bentley, David R. .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 2006, 16 (06) :545-552
[2]   Comparative genomic structure of prokaryotes [J].
Bentley, SD ;
Parkhill, J .
ANNUAL REVIEW OF GENETICS, 2004, 38 :771-792
[3]  
Brady A, 2009, NAT METHODS, V6, P673, DOI [10.1038/nmeth.1358, 10.1038/NMETH.1358]
[4]   A detailed analysis of 16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria [J].
Chakravorty, Soumitesh ;
Helb, Danica ;
Burday, Michele ;
Connell, Nancy ;
Alland, David .
JOURNAL OF MICROBIOLOGICAL METHODS, 2007, 69 (02) :330-339
[5]   Toward automatic reconstruction of a highly resolved tree of life [J].
Ciccarelli, FD ;
Doerks, T ;
von Mering, C ;
Creevey, CJ ;
Snel, B ;
Bork, P .
SCIENCE, 2006, 311 (5765) :1283-1287
[6]   TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach [J].
Diaz, Naryttza N. ;
Krause, Lutz ;
Goesmann, Alexander ;
Niehaus, Karsten ;
Nattkemper, Tim W. .
BMC BIOINFORMATICS, 2009, 10
[7]   Functional metagenomic profiling of nine biomes [J].
Dinsdale, Elizabeth A. ;
Edwards, Robert A. ;
Hall, Dana ;
Angly, Florent ;
Breitbart, Mya ;
Brulc, Jennifer M. ;
Furlan, Mike ;
Desnues, Christelle ;
Haynes, Matthew ;
Li, Linlin ;
McDaniel, Lauren ;
Moran, Mary Ann ;
Nelson, Karen E. ;
Nilsson, Christina ;
Olson, Robert ;
Paul, John ;
Brito, Beltran Rodriguez ;
Ruan, Yijun ;
Swan, Brandon K. ;
Stevens, Rick ;
Valentine, David L. ;
Thurber, Rebecca Vega ;
Wegley, Linda ;
White, Bryan A. ;
Rohwer, Forest .
NATURE, 2008, 452 (7187) :629-U8
[8]   Microbial Ecology of Four Coral Atolls in the Northern Line Islands [J].
Dinsdale, Elizabeth A. ;
Pantos, Olga ;
Smriga, Steven ;
Edwards, Robert A. ;
Angly, Florent ;
Wegley, Linda ;
Hatay, Mark ;
Hall, Dana ;
Brown, Elysa ;
Haynes, Matthew ;
Krause, Lutz ;
Sala, Enric ;
Sandin, Stuart A. ;
Thurber, Rebecca Vega ;
Willis, Bette L. ;
Azam, Farooq ;
Knowlton, Nancy ;
Rohwer, Forest .
PLOS ONE, 2008, 3 (02)
[9]   Pfam:: clans, web tools and services [J].
Finn, Robert D. ;
Mistry, Jaina ;
Schuster-Bockler, Benjamin ;
Griffiths-Jones, Sam ;
Hollich, Volker ;
Lassmann, Timo ;
Moxon, Simon ;
Marshall, Mhairi ;
Khanna, Ajay ;
Durbin, Richard ;
Eddy, Sean R. ;
Sonnhammer, Erik L. L. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D247-D251
[10]   Environments shape the nucleotide composition of genomes [J].
Foerstner, KU ;
von Mering, C ;
Hooper, SD ;
Bork, P .
EMBO REPORTS, 2005, 6 (12) :1208-1213