Accurate read-based metagenome characterization using a hierarchical suite of unique signatures

被引:112
作者
Freitas, Tracey Allen K. [1 ]
Li, Po-E [1 ]
Scholz, Matthew B. [1 ]
Chain, Patrick S. G. [1 ]
机构
[1] Los Alamos Natl Lab, Biosci Div, Los Alamos, NM 87545 USA
关键词
ESCHERICHIA-COLI; SEQUENCE; GENERATION; DIVERSITY;
D O I
10.1093/nar/gkv180
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A major challenge in the field of shotgun metagenomics is the accurate identification of organisms present within a microbial community, based on classification of short sequence reads. Though existing microbial community profiling methods have attempted to rapidly classify the millions of reads output from modern sequencers, the combination of incomplete databases, similarity among otherwise divergent genomes, errors and biases in sequencing technologies, and the large volumes of sequencing data required for metagenome sequencing has led to unacceptably high false discovery rates (FDR). Here, we present the application of a novel, gene-independent and signature-based metagenomic taxonomic profiling method with significantly and consistently smaller FDR than any other available method. Our algorithm circumvents false positives using a series of non-redundant signature databases and examines (G) under bar enomic (O) under bar rigins (T) under bar hrough (T) under bar axonomic (CHA) under bar llenge (GOTTCHA). GOTTCHA was tested and validated on 20 synthetic and mock datasets ranging in community composition and complexity, was applied successfully to data generated from spiked environmental and clinical samples, and robustly demonstrates superior performance compared with other available tools.
引用
收藏
页数:14
相关论文
共 36 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] [Anonymous], 2002, NCBI HDB
  • [3] Rapid phylogenetic and functional classification of short genomic fragments with signature peptides
    Berendzen J.
    Bruno W.J.
    Cohn J.D.
    Hengartner N.W.
    Kuske C.R.
    McMahon B.H.
    Wolinsky M.A.
    Xie G.
    [J]. BMC Research Notes, 5 (1)
  • [4] Average genome size: a potential source of bias in comparative metagenomics
    Beszteri, Bank
    Temperton, Ben
    Frickenhaus, Stephan
    Giovannoni, Stephen J.
    [J]. ISME JOURNAL, 2010, 4 (08) : 1075 - 1077
  • [5] Diversity and population structure of a near-shore marine-sediment viral community
    Breitbart, M
    Felts, B
    Kelley, S
    Mahaffy, JM
    Nulton, J
    Salamon, P
    Rohwer, F
    [J]. PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2004, 271 (1539) : 565 - 574
  • [6] Genomic analysis of uncultured marine viral communities
    Breitbart, M
    Salamon, P
    Andresen, B
    Mahaffy, JM
    Segall, AM
    Mead, D
    Azam, F
    Rohwer, F
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (22) : 14250 - 14255
  • [7] Genome Project Standards in a New Era of Sequencing
    Chain, P. S. G.
    Grafham, D. V.
    Fulton, R. S.
    FitzGerald, M. G.
    Hostetler, J.
    Muzny, D.
    Ali, J.
    Birren, B.
    Bruce, D. C.
    Buhay, C.
    Cole, J. R.
    Ding, Y.
    Dugan, S.
    Field, D.
    Garrity, G. M.
    Gibbs, R.
    Graves, T.
    Han, C. S.
    Harrison, S. H.
    Highlander, S.
    Hugenholtz, P.
    Khouri, H. M.
    Kodira, C. D.
    Kolker, E.
    Kyrpides, N. C.
    Lang, D.
    Lapidus, A.
    Malfatti, S. A.
    Markowitz, V.
    Metha, T.
    Nelson, K. E.
    Parkhill, J.
    Pitluck, S.
    Qin, X.
    Read, T. D.
    Schmutz, J.
    Sozhamannan, S.
    Sterk, P.
    Strausberg, R. L.
    Sutton, G.
    Thomson, N. R.
    Tiedje, J. M.
    Weinstock, G.
    Wollam, A.
    Detter, J. C.
    [J]. SCIENCE, 2009, 326 (5950) : 236 - 237
  • [8] The evolution of the Escherichia coli phylogeny
    Chaudhuri, Roy R.
    Henderson, Ian R.
    [J]. INFECTION GENETICS AND EVOLUTION, 2012, 12 (02) : 214 - 226
  • [9] Kraken: A set of tools for quality control and analysis Of high-throughput sequence data
    Davis, Matthew P. A.
    van Dongen, Stijn
    Abreu-Goodger, Cei
    Bartonicek, Nenad
    Enright, Anton J.
    [J]. METHODS, 2013, 63 (01) : 41 - 49
  • [10] Illumina-based analysis of microbial community diversity
    Degnan, Patrick H.
    Ochman, Howard
    [J]. ISME JOURNAL, 2012, 6 (01) : 183 - 194