BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation

被引:114
作者
Graham, Elaina D. [1 ]
Heidelberg, John F. [1 ]
Tully, Benjamin J. [1 ,2 ]
机构
[1] Univ Southern Calif, Dept Biol Sci, Los Angeles, CA USA
[2] Ctr Dark Energy Biosphere Investigat, Los Angeles, CA USA
关键词
Affinity propagation; Metagenomics; Microbial ecology; Metagenome-assembled genomes; Clustering; Binning; METAGENOMIC CONTIGS; CODON USAGE; GENOMES; COMMUNITY; TOOL;
D O I
10.7717/peerj.3035
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of 'binning' contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms. This separation of composition and coverage based clustering reduces bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that BinSanity has a higher precision, recall, and Adjusted Rand Index compared to five commonly implemented methods. When tested on a previously published environmental metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes.
引用
收藏
页数:19
相关论文
共 42 条
[1]  
Alneberg J, 2014, NAT METHODS, V11, P1144, DOI [10.1038/NMETH.3103, 10.1038/nmeth.3103]
[2]   Metagenomic resolution of microbial functions in deep-sea hydrothermal plumes across the Eastern Lau Spreading Center [J].
Anantharaman, Karthik ;
Breier, John A. ;
Dick, Gregory J. .
ISME JOURNAL, 2016, 10 (01) :225-239
[3]   Analysis of intra-genomic GC content homogeneity within prokaryotes [J].
Bohlin, Jon ;
Snipen, Lars ;
Hardy, Simon P. ;
Kristoffersen, Anja B. ;
Lagesen, Karin ;
Donsvik, Torunn ;
Skjerve, Eystein ;
Ussery, David W. .
BMC GENOMICS, 2010, 11
[4]   Impact of library preparation protocols and template quantity on the metagenomic reconstruction of a mock microbial community [J].
Bowers, Robert M. ;
Clum, Alicia ;
Tice, Hope ;
Lim, Joanne ;
Singh, Kanwar ;
Ciobanu, Doina ;
Ngan, Chew Yee ;
Cheng, Jan-Fang ;
Tringe, Susannah G. ;
Woyke, Tanja .
BMC GENOMICS, 2015, 16
[5]   Codon usage between genomes is constrained by genome-wide mutational processes [J].
Chen, SL ;
Lee, W ;
Hottes, AK ;
Shapiro, L ;
McAdams, HH .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (10) :3480-3485
[6]  
Chuang CC, 2015, IEEE ICCE, P230, DOI 10.1109/ICCE-TW.2015.7216871
[7]   Community-wide analysis of microbial genome sequence signatures [J].
Dick, Gregory J. ;
Andersson, Anders F. ;
Baker, Brett J. ;
Simmons, Sheri L. ;
Yelton, A. Pepper ;
Banfield, Jillian F. .
GENOME BIOLOGY, 2009, 10 (08)
[8]   Anvi'o: an advanced analysis and visualization platformfor 'omics data [J].
Eren, A. Murat ;
Esen, Ozcan C. ;
Quince, Christopher ;
Vineis, Joseph H. ;
Morrison, Hilary G. ;
Sogin, Mitchell L. ;
Delmont, Tom O. .
PEERJ, 2015, 3
[9]  
Flynn SD, 2013, Google Patents, Patent No. [US 20070203872 A1, 20070203872]
[10]   Clustering by passing messages between data points [J].
Frey, Brendan J. ;
Dueck, Delbert .
SCIENCE, 2007, 315 (5814) :972-976