GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes

被引:108
作者
Boyd, Joel A. [1 ]
Woodcroft, Ben J. [1 ]
Tyson, Gene W. [1 ]
机构
[1] Univ Queensland, Sch Chem & Mol Biosci, Australian Ctr Ecogen, St Lucia, Qld 4072, Australia
基金
澳大利亚研究理事会; 美国能源部;
关键词
MICROBIAL GENOMES; MAXIMUM-LIKELIHOOD; METABOLISM; SEQUENCES; ALIGNMENT; IDENTIFICATION; PERFORMANCE; PLACEMENT; BACTERIA; BIOLOGY;
D O I
10.1093/nar/gky174
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Large-scale metagenomic datasets enable the recovery of hundreds of population genomes from environmental samples. However, these genomes do not typically represent the full diversity of complex microbial communities. Gene-centric approaches can be used to gain a comprehensive view of diversity by examining each read independently, but traditional pairwise comparison approaches typically over-classify taxonomy and scale poorly with increasing metagenome and database sizes. Here we introduce GraftM, a tool that uses gene specific packages to rapidly identify gene families in metagenomic data using hidden Markov models (HMMs) or DIAMOND databases, and classifies these sequences using placement into pre-constructed gene trees. The speed and accuracy of GraftM was benchmarked with in silico and in vitro mock communities using taxonomic markers, and was found to have higher accuracy at the family level with a processing time 2.0-3.7x faster than currently available software. Exploration of a wetland metagenome using 16S rRNA-and methyl-coenzyme M reductase (McrA)specific gpkgs revealed taxonomic and functional shifts across a depth gradient. Analysis of the NCBI nr database using the McrA gpkg allowed the detection of novel sequences belonging to phylum-level lineages.
引用
收藏
页数:9
相关论文
共 47 条
[11]  
Glass Elizabeth M, 2010, Cold Spring Harb Protoc, V2010, DOI 10.1101/pdb.prot5368
[12]   RefSeq: an update on prokaryotic genome annotation and curation [J].
Haft, Daniel H. ;
DiCuccio, Michael ;
Badretdin, Azat ;
Brover, Vyacheslav ;
Chetvernin, Vyacheslav ;
O'Neill, Kathleen ;
Li, Wenjun ;
Chitsaz, Farideh ;
Derbyshire, Myra K. ;
Gonzales, Noreen R. ;
Gwadz, Marc ;
Lu, Fu ;
Marchler, Gabriele H. ;
Song, James S. ;
Thanki, Narmada ;
Yamashita, Roxanne A. ;
Zheng, Chanjuan ;
Thibaud-Nissen, Francoise ;
Geer, Lewis Y. ;
Marchler-Bauer, Aron ;
Pruitt, Kim D. .
NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) :D851-D860
[13]   Tackling soil diversity with the assembly of large, complex metagenomes [J].
Howe, Adina Chuang ;
Jansson, Janet K. ;
Malfatti, Stephanie A. ;
Tringe, Susannah G. ;
Tiedje, James M. ;
Brown, C. Titus .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (13) :4904-4909
[14]   Integrative analysis of environmental sequences using MEGAN4 [J].
Huson, Daniel H. ;
Mitra, Suparna ;
Ruscheweyh, Hans-Joachim ;
Weber, Nico ;
Schuster, Stephan C. .
GENOME RESEARCH, 2011, 21 (09) :1552-1560
[15]   Decadal vegetation changes in a northern peatland, greenhouse gas fluxes and net radiative forcing [J].
Johansson, Torbjorn ;
Malmer, Nils ;
Crill, Patrick M. ;
Friborg, Thomas ;
Akerman, Jonas H. ;
Mastepanov, Mikhail ;
Christensen, Torben R. .
GLOBAL CHANGE BIOLOGY, 2006, 12 (12) :2352-2369
[16]   MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities [J].
Kang, Dongwan D. ;
Froula, Jeff ;
Egan, Rob ;
Wang, Zhong .
PEERJ, 2015, 3
[17]   MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability [J].
Katoh, Kazutaka ;
Standley, Daron M. .
MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (04) :772-780
[18]   Adaptive seeds tame genomic sequence comparison [J].
Kielbasa, Szymon M. ;
Wan, Raymond ;
Sato, Kengo ;
Horton, Paul ;
Frith, Martin C. .
GENOME RESEARCH, 2011, 21 (03) :487-493
[19]   Phylogeny-aware identification and correction of taxonomically mislabeled sequences [J].
Kozlov, Alexey M. ;
Zhang, Jiajie ;
Yilmaz, Pelin ;
Gloeckner, Frank Oliver ;
Stamatakis, Alexandros .
NUCLEIC ACIDS RESEARCH, 2016, 44 (11) :5022-5033
[20]   Phylogenetic classification of short environmental DNA fragments [J].
Krause, Lutz ;
Diaz, Naryttza N. ;
Goesmann, Alexander ;
Kelley, Scott ;
Nattkemper, Tim W. ;
Rohwer, Forest ;
Edwards, Robert A. ;
Stoye, Jens .
NUCLEIC ACIDS RESEARCH, 2008, 36 (07) :2230-2239