GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes

被引:108
作者
Boyd, Joel A. [1 ]
Woodcroft, Ben J. [1 ]
Tyson, Gene W. [1 ]
机构
[1] Univ Queensland, Sch Chem & Mol Biosci, Australian Ctr Ecogen, St Lucia, Qld 4072, Australia
基金
澳大利亚研究理事会; 美国能源部;
关键词
MICROBIAL GENOMES; MAXIMUM-LIKELIHOOD; METABOLISM; SEQUENCES; ALIGNMENT; IDENTIFICATION; PERFORMANCE; PLACEMENT; BACTERIA; BIOLOGY;
D O I
10.1093/nar/gky174
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Large-scale metagenomic datasets enable the recovery of hundreds of population genomes from environmental samples. However, these genomes do not typically represent the full diversity of complex microbial communities. Gene-centric approaches can be used to gain a comprehensive view of diversity by examining each read independently, but traditional pairwise comparison approaches typically over-classify taxonomy and scale poorly with increasing metagenome and database sizes. Here we introduce GraftM, a tool that uses gene specific packages to rapidly identify gene families in metagenomic data using hidden Markov models (HMMs) or DIAMOND databases, and classifies these sequences using placement into pre-constructed gene trees. The speed and accuracy of GraftM was benchmarked with in silico and in vitro mock communities using taxonomic markers, and was found to have higher accuracy at the family level with a processing time 2.0-3.7x faster than currently available software. Exploration of a wetland metagenome using 16S rRNA-and methyl-coenzyme M reductase (McrA)specific gpkgs revealed taxonomic and functional shifts across a depth gradient. Analysis of the NCBI nr database using the McrA gpkg allowed the detection of novel sequences belonging to phylum-level lineages.
引用
收藏
页数:9
相关论文
共 47 条
[1]   Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes [J].
Albertsen, Mads ;
Hugenholtz, Philip ;
Skarshewski, Adam ;
Nielsen, Kare L. ;
Tyson, Gene W. ;
Nielsen, Per H. .
NATURE BIOTECHNOLOGY, 2013, 31 (06) :533-+
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood [J].
Berger, Simon A. ;
Krompass, Denis ;
Stamatakis, Alexandros .
SYSTEMATIC BIOLOGY, 2011, 60 (03) :291-302
[4]   Unusual biology across a group comprising more than 15% of domain Bacteria [J].
Brown, Christopher T. ;
Hug, Laura A. ;
Thomas, Brian C. ;
Sharon, Itai ;
Castelle, Cindy J. ;
Singh, Andrea ;
Wilkins, Michael J. ;
Wrighton, Kelly C. ;
Williams, Kenneth H. ;
Banfield, Jillian F. .
NATURE, 2015, 523 (7559) :208-U173
[5]   Fast and sensitive protein alignment using DIAMOND [J].
Buchfink, Benjamin ;
Xie, Chao ;
Huson, Daniel H. .
NATURE METHODS, 2015, 12 (01) :59-60
[6]   PhyloSift: phylogenetic analysis of genomes and metagenomes [J].
Darling, Aaron E. ;
Jospin, Guillaume ;
Lowe, Eric ;
Matsen, Frederick A., IV ;
Bik, Holly M. ;
Eisen, Jonathan A. .
PEERJ, 2014, 2
[7]   Metagenomic analyses reveal no differences in genes involved in cellulose degradation under different tillage treatments [J].
de Vries, Maria ;
Schoeler, Anne ;
Ertl, Julia ;
Xu, Zhuofei ;
Schloter, Michael .
FEMS MICROBIOLOGY ECOLOGY, 2015, 91 (07)
[8]   Accelerated Profile HMM Searches [J].
Eddy, Sean R. .
PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (10)
[9]   Methane metabolism in the archaeal phylum Bathyarchaeota revealed by genome-centric metagenomics [J].
Evans, Paul N. ;
Parks, Donovan H. ;
Chadwick, Grayson L. ;
Robbins, Steven J. ;
Orphan, Victoria J. ;
Golding, Suzanne D. ;
Tyson, Gene W. .
SCIENCE, 2015, 350 (6259) :434-438
[10]   The genome of M-acetivorans reveals extensive metabolic and physiological diversity [J].
Galagan, JE ;
Nusbaum, C ;
Roy, A ;
Endrizzi, MG ;
Macdonald, P ;
FitzHugh, W ;
Calvo, S ;
Engels, R ;
Smirnov, S ;
Atnoor, D ;
Brown, A ;
Allen, N ;
Naylor, J ;
Stange-Thomann, N ;
DeArellano, K ;
Johnson, R ;
Linton, L ;
McEwan, P ;
McKernan, K ;
Talamas, J ;
Tirrell, A ;
Ye, WJ ;
Zimmer, A ;
Barber, RD ;
Cann, I ;
Graham, DE ;
Grahame, DA ;
Guss, AM ;
Hedderich, R ;
Ingram-Smith, C ;
Kuettner, HC ;
Krzycki, JA ;
Leigh, JA ;
Li, WX ;
Liu, JF ;
Mukhopadhyay, B ;
Reeve, JN ;
Smith, K ;
Springer, TA ;
Umayam, LA ;
White, O ;
White, RH ;
de Macario, EC ;
Ferry, JG ;
Jarrell, KF ;
Jing, H ;
Macario, AJL ;
Paulsen, I ;
Pritchett, M ;
Sowers, KR .
GENOME RESEARCH, 2002, 12 (04) :532-542