BMC3C: binning metagenomic contigs using codon usage, sequence composition and read coverage

被引:34
作者
Yu, Guoxian [1 ]
Jiang, Yuan [1 ]
Wang, Jun [1 ]
Zhang, Hao [2 ,3 ]
Luo, Haiwei [2 ,3 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
[2] Chinese Univ Hong Kong, Sch Life Sci, Shatin, Hong Kong, Peoples R China
[3] Chinese Univ Hong Kong, Partner State Key Lab Agrobiotechnol, Shatin, Hong Kong, Peoples R China
关键词
PHYLOGENETIC CLASSIFICATION; GENOMES; ALGORITHM;
D O I
10.1093/bioinformatics/bty519
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Metagenomics investigates the DNA sequences directly recovered from environmental samples. It often starts with reads assembly, which leads to contigs rather than more complete genomes. Therefore, contig binning methods are subsequently used to bin contigs into genome bins. While some clustering-based binning methods have been developed, they generally suffer from problems related to stability and robustness. Results: We introduce BMC3C, an ensemble clustering-based method, to accurately and robustly bin contigs by making use of DNA sequence Composition, Coverage across multiple samples and Codon usage. BMC3C begins by searching the proper number of clusters and repeatedly applying the k-means clustering with different initializations to cluster contigs. Next, a weight graph with each node representing a contig is derived from these clusters. If two contigs are frequently grouped into the same cluster, the weight between them is high, and otherwise low. BMC3C finally employs a graph partitioning technique to partition the weight graph into subgraphs, each corresponding to a genome bin. We conduct experiments on both simulated and real-world datasets to evaluate BMC3C, and compare it with the state-of-the-art binning tools. We show that BMC3C has an improved performance compared to these tools. To our knowledge, this is the first time that the codon usage features and ensemble clustering are used in metagenomic contig binning.
引用
收藏
页码:4172 / 4179
页数:8
相关论文
共 34 条
[1]  
Alneberg J, 2014, NAT METHODS, V11, P1144, DOI [10.1038/NMETH.3103, 10.1038/nmeth.3103]
[2]   SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing [J].
Bankevich, Anton ;
Nurk, Sergey ;
Antipov, Dmitry ;
Gurevich, Alexey A. ;
Dvorkin, Mikhail ;
Kulikov, Alexander S. ;
Lesin, Valery M. ;
Nikolenko, Sergey I. ;
Son Pham ;
Prjibelski, Andrey D. ;
Pyshkin, Alexey V. ;
Sirotkin, Alexander V. ;
Vyahhi, Nikolay ;
Tesler, Glenn ;
Alekseyev, Max A. ;
Pevzner, Pavel A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) :455-477
[3]  
Brady A, 2009, NAT METHODS, V6, P673, DOI [10.1038/nmeth.1358, 10.1038/NMETH.1358]
[4]  
BULMER M, 1991, GENETICS, V129, P897
[5]   Metagenomic Sequencing of the Chronic Obstructive Pulmonary Disease Upper Bronchial Tract Microbiome Reveals Functional Changes Associated with Disease Severity [J].
Cameron, Simon J. S. ;
Lewis, Keir E. ;
Huws, Sharon A. ;
Lin, Wanchang ;
Hegarty, Matthew J. ;
Lewis, Paul D. ;
Mur, Luis A. J. ;
Pachebat, Justin A. .
PLOS ONE, 2016, 11 (02)
[6]   TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach [J].
Diaz, Naryttza N. ;
Krause, Lutz ;
Goesmann, Alexander ;
Niehaus, Karsten ;
Nattkemper, Tim W. .
BMC BIOINFORMATICS, 2009, 10
[7]   Combining multiple clusterings using evidence accumulation [J].
Fred, ALN ;
Jain, AK .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (06) :835-850
[8]   Nucleotide usage, synonymous substitution pattern, and past recombination in genomes of Streptococcus pyogenes [J].
Hughes, Austin L. ;
Langley, Katherine J. .
INFECTION GENETICS AND EVOLUTION, 2007, 7 (02) :188-196
[9]   Structure, function and diversity of the healthy human microbiome [J].
Huttenhower, Curtis ;
Gevers, Dirk ;
Knight, Rob ;
Abubucker, Sahar ;
Badger, Jonathan H. ;
Chinwalla, Asif T. ;
Creasy, Heather H. ;
Earl, Ashlee M. ;
FitzGerald, Michael G. ;
Fulton, Robert S. ;
Giglio, Michelle G. ;
Hallsworth-Pepin, Kymberlie ;
Lobos, Elizabeth A. ;
Madupu, Ramana ;
Magrini, Vincent ;
Martin, John C. ;
Mitreva, Makedonka ;
Muzny, Donna M. ;
Sodergren, Erica J. ;
Versalovic, James ;
Wollam, Aye M. ;
Worley, Kim C. ;
Wortman, Jennifer R. ;
Young, Sarah K. ;
Zeng, Qiandong ;
Aagaard, Kjersti M. ;
Abolude, Olukemi O. ;
Allen-Vercoe, Emma ;
Alm, Eric J. ;
Alvarado, Lucia ;
Andersen, Gary L. ;
Anderson, Scott ;
Appelbaum, Elizabeth ;
Arachchi, Harindra M. ;
Armitage, Gary ;
Arze, Cesar A. ;
Ayvaz, Tulin ;
Baker, Carl C. ;
Begg, Lisa ;
Belachew, Tsegahiwot ;
Bhonagiri, Veena ;
Bihan, Monika ;
Blaser, Martin J. ;
Bloom, Toby ;
Bonazzi, Vivien ;
Brooks, J. Paul ;
Buck, Gregory A. ;
Buhay, Christian J. ;
Busam, Dana A. ;
Campbell, Joseph L. .
NATURE, 2012, 486 (7402) :207-214
[10]   Prodigal: prokaryotic gene recognition and translation initiation site identification [J].
Hyatt, Doug ;
Chen, Gwo-Liang ;
LoCascio, Philip F. ;
Land, Miriam L. ;
Larimer, Frank W. ;
Hauser, Loren J. .
BMC BIOINFORMATICS, 2010, 11