CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

被引:7147
作者
Parks, Donovan H. [1 ]
Imelfort, Michael [1 ]
Skennerton, Connor T. [1 ]
Hugenholtz, Philip [1 ,2 ]
Tyson, Gene W. [1 ,3 ]
机构
[1] Univ Queensland, Sch Chem & Mol Biosci, Australian Ctr Ecogen, St Lucia, Qld 4072, Australia
[2] Univ Queensland, Inst Mol Biosci, St Lucia, Qld 4072, Australia
[3] Univ Queensland, Adv Water Management Ctr, St Lucia, Qld 4072, Australia
基金
澳大利亚研究理事会; 加拿大自然科学与工程研究理事会;
关键词
MAXIMUM-LIKELIHOOD; BACTERIAL; SEQUENCES; PHYLOGENY; METABOLISM; PLACEMENT; COVERAGE; TAXONOMY; INSIGHTS; TOOL;
D O I
10.1101/gr.186072.114
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of "marker" genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities.
引用
收藏
页码:1043 / 1055
页数:13
相关论文
共 47 条
  • [21] IMG 4 version of the integrated microbial genomes comparative analysis system
    Markowitz, Victor M.
    Chen, I-Min A.
    Palaniappan, Krishna
    Chu, Ken
    Szeto, Ernest
    Pillay, Manoj
    Ratner, Anna
    Huang, Jinghua
    Woyke, Tanja
    Huntemann, Marcel
    Anderson, Iain
    Billis, Konstantinos
    Varghese, Neha
    Mavromatis, Konstantinos
    Pati, Amrita
    Ivanova, Natalia N.
    Kyrpides, Nikos C.
    [J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) : D560 - D567
  • [22] pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree
    Matsen, Frederick A.
    Kodner, Robin B.
    Armbrust, E. Virginia
    [J]. BMC BIOINFORMATICS, 2010, 11 : 538
  • [23] An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea
    McDonald, Daniel
    Price, Morgan N.
    Goodrich, Julia
    Nawrocki, Eric P.
    DeSantis, Todd Z.
    Probst, Alexander
    Andersen, Gary L.
    Knight, Rob
    Hugenholtz, Philip
    [J]. ISME JOURNAL, 2012, 6 (03) : 610 - 618
  • [24] Mende DR, 2013, NAT METHODS, V10, P881, DOI [10.1038/NMETH.2575, 10.1038/nmeth.2575]
  • [25] Metagenome and mRNA expression analyses of anaerobic methanotrophic archaea of the ANME-1 group
    Meyerdierks, Anke
    Kube, Michael
    Kostadinov, Ivaylo
    Teeling, Hanno
    Gloeckner, Frank Oliver
    Reinhardt, Richard
    Amann, Rudolf
    [J]. ENVIRONMENTAL MICROBIOLOGY, 2010, 12 (02) : 422 - 439
  • [26] Classifying short genomic fragments from novel lineages using composition and homology
    Parks, Donovan H.
    MacDonald, Norman J.
    Beiko, Robert G.
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [27] Taxonomic metagenome sequence assignment with structured output models
    Patil, Kaustubh R.
    Haider, Peter
    Pope, Phillip B.
    Turnbaugh, Peter J.
    Morrison, Mark
    Scheffer, Tobias
    McHardy, Alice C.
    [J]. NATURE METHODS, 2011, 8 (03) : 191 - 192
  • [28] FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix
    Price, Morgan N.
    Dehal, Paramvir S.
    Arkin, Adam P.
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2009, 26 (07) : 1641 - 1650
  • [29] Insights into the phylogeny and coding potential of microbial dark matter
    Rinke, Christian
    Schwientek, Patrick
    Sczyrba, Alexander
    Ivanova, Natalia N.
    Anderson, Iain J.
    Cheng, Jan-Fang
    Darling, Aaron
    Malfatti, Stephanie
    Swan, Brandon K.
    Gies, Esther A.
    Dodsworth, Jeremy A.
    Hedlund, Brian P.
    Tsiamis, George
    Sievert, Stefan M.
    Liu, Wen-Tso
    Eisen, Jonathan A.
    Hallam, Steven J.
    Kyrpides, Nikos C.
    Stepanauskas, Ramunas
    Rubin, Edward M.
    Hugenholtz, Philip
    Woyke, Tanja
    [J]. NATURE, 2013, 499 (7459) : 431 - 437
  • [30] GAGE: A critical evaluation of genome assemblies and assembly algorithms
    Salzberg, Steven L.
    Phillippy, Adam M.
    Zimin, Aleksey
    Puiu, Daniela
    Magoc, Tanja
    Koren, Sergey
    Treangen, Todd J.
    Schatz, Michael C.
    Delcher, Arthur L.
    Roberts, Michael
    Marcais, Guillaume
    Pop, Mihai
    Yorke, James A.
    [J]. GENOME RESEARCH, 2012, 22 (03) : 557 - 567