CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

被引:7147
作者
Parks, Donovan H. [1 ]
Imelfort, Michael [1 ]
Skennerton, Connor T. [1 ]
Hugenholtz, Philip [1 ,2 ]
Tyson, Gene W. [1 ,3 ]
机构
[1] Univ Queensland, Sch Chem & Mol Biosci, Australian Ctr Ecogen, St Lucia, Qld 4072, Australia
[2] Univ Queensland, Inst Mol Biosci, St Lucia, Qld 4072, Australia
[3] Univ Queensland, Adv Water Management Ctr, St Lucia, Qld 4072, Australia
基金
澳大利亚研究理事会; 加拿大自然科学与工程研究理事会;
关键词
MAXIMUM-LIKELIHOOD; BACTERIAL; SEQUENCES; PHYLOGENY; METABOLISM; PLACEMENT; COVERAGE; TAXONOMY; INSIGHTS; TOOL;
D O I
10.1101/gr.186072.114
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of "marker" genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities.
引用
收藏
页码:1043 / 1055
页数:13
相关论文
共 47 条
  • [1] PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies
    Akhter, Sajia
    Aziz, Ramy K.
    Edwards, Robert A.
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (16) : e126
  • [2] Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes
    Albertsen, Mads
    Hugenholtz, Philip
    Skarshewski, Adam
    Nielsen, Kare L.
    Tyson, Gene W.
    Nielsen, Per H.
    [J]. NATURE BIOTECHNOLOGY, 2013, 31 (06) : 533 - +
  • [3] [Anonymous], ARXIV13086333V1
  • [4] Brady A, 2009, NAT METHODS, V6, P673, DOI [10.1038/nmeth.1358, 10.1038/NMETH.1358]
  • [5] Genome Project Standards in a New Era of Sequencing
    Chain, P. S. G.
    Grafham, D. V.
    Fulton, R. S.
    FitzGerald, M. G.
    Hostetler, J.
    Muzny, D.
    Ali, J.
    Birren, B.
    Bruce, D. C.
    Buhay, C.
    Cole, J. R.
    Ding, Y.
    Dugan, S.
    Field, D.
    Garrity, G. M.
    Gibbs, R.
    Graves, T.
    Han, C. S.
    Harrison, S. H.
    Highlander, S.
    Hugenholtz, P.
    Khouri, H. M.
    Kodira, C. D.
    Kolker, E.
    Kyrpides, N. C.
    Lang, D.
    Lapidus, A.
    Malfatti, S. A.
    Markowitz, V.
    Metha, T.
    Nelson, K. E.
    Parkhill, J.
    Pitluck, S.
    Qin, X.
    Read, T. D.
    Schmutz, J.
    Sozhamannan, S.
    Sterk, P.
    Strausberg, R. L.
    Sutton, G.
    Thomson, N. R.
    Tiedje, J. M.
    Weinstock, G.
    Wollam, A.
    Detter, J. C.
    [J]. SCIENCE, 2009, 326 (5950) : 236 - 237
  • [6] PhyloSift: phylogenetic analysis of genomes and metagenomes
    Darling, Aaron E.
    Jospin, Guillaume
    Lowe, Eric
    Matsen, Frederick A., IV
    Bik, Holly M.
    Eisen, Jonathan A.
    [J]. PEERJ, 2014, 2
  • [7] Community-wide analysis of microbial genome sequence signatures
    Dick, Gregory J.
    Andersson, Anders F.
    Baker, Brett J.
    Simmons, Sheri L.
    Yelton, A. Pepper
    Banfield, Jillian F.
    [J]. GENOME BIOLOGY, 2009, 10 (08):
  • [8] Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods
    Droege, J.
    Gregor, I.
    McHardy, A. C.
    [J]. BIOINFORMATICS, 2015, 31 (06) : 817 - 824
  • [9] Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage
    Dupont, Chris L.
    Rusch, Douglas B.
    Yooseph, Shibu
    Lombardo, Mary-Jane
    Richter, R. Alexander
    Valas, Ruben
    Novotny, Mark
    Yee-Greenbaum, Joyclyn
    Selengut, Jeremy D.
    Haft, Dan H.
    Halpern, Aaron L.
    Lasken, Roger S.
    Nealson, Kenneth
    Friedman, Robert
    Venter, J. Craig
    [J]. ISME JOURNAL, 2012, 6 (06) : 1186 - 1199
  • [10] The minimum information about a genome sequence (MIGS) specification
    Field, Dawn
    Garrity, George
    Gray, Tanya
    Morrison, Norman
    Selengut, Jeremy
    Sterk, Peter
    Tatusova, Tatiana
    Thomson, Nicholas
    Allen, Michael J.
    Angiuoli, Samuel V.
    Ashburner, Michael
    Axelrod, Nelson
    Baldauf, Sandra
    Ballard, Stuart
    Boore, Jeffrey
    Cochrane, Guy
    Cole, James
    Dawyndt, Peter
    De Vos, Paul
    dePamphilis, Claude
    Edwards, Robert
    Faruque, Nadeem
    Feldman, Robert
    Gilbert, Jack
    Gilna, Paul
    Gloeckner, Frank Oliver
    Goldstein, Philip
    Guralnick, Robert
    Haft, Dan
    Hancock, David
    Hermjakob, Henning
    Hertz-Fowler, Christiane
    Hugenholtz, Phil
    Joint, Ian
    Kagan, Leonid
    Kane, Matthew
    Kennedy, Jessie
    Kowalchuk, George
    Kottmann, Renzo
    Kolker, Eugene
    Kravitz, Saul
    Kyrpides, Nikos
    Leebens-Mack, Jim
    Lewis, Suzanna E.
    Li, Kelvin
    Lister, Allyson L.
    Lord, Phillip
    Maltsev, Natalia
    Markowitz, Victor
    Martiny, Jennifer
    [J]. NATURE BIOTECHNOLOGY, 2008, 26 (05) : 541 - 547