HAYSTAC: A Bayesian framework for robust and rapid species identification in high-throughput sequencing data

被引:7
作者
Dimopoulos, Evangelos A. [1 ]
Carmagnini, Alberto [2 ,3 ]
Velsko, Irina M. [4 ]
Warinner, Christina [4 ,5 ]
Larson, Greger [1 ]
Frantz, Laurent A. F. [2 ,3 ]
Irving-Pease, Evan K. [1 ,6 ]
机构
[1] Univ Oxford, Palaeogen & Bioarchaeol Res Network, Res Lab Archaeol & Hist Art, Oxford, England
[2] Ludwig Maximilians Univ Munchen, Palaeogen Grp, Dept Vet Sci, Munich, Germany
[3] Queen Mary Univ London, Sch Biol & Chem Sci, London, England
[4] Max Planck Inst Sci Human Hist, Dept Archaeogenet, Jena, Germany
[5] Harvard Univ, Dept Anthropol, Cambridge, MA 02138 USA
[6] Univ Copenhagen, Lundbeck Fdn, Globe Inst, GeoGenet Ctr, Copenhagen, Denmark
基金
英国惠康基金; 欧洲研究理事会;
关键词
METAGENOMIC ANALYSIS; YERSINIA-PESTIS; ANCIENT; DNA; GENOMES; TUBERCULOSIS; PATHOGENS; VICTIMS; REMAINS;
D O I
10.1371/journal.pcbi.1010493
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Identification of specific species in metagenomic samples is critical for several key applications, yet many tools available require large computational power and are often prone to false positive identifications. Here we describe High-AccuracY and Scalable Taxonomic Assignment of MetagenomiC data (HAYSTAC), which can estimate the probability that a specific taxon is present in a metagenome. HAYSTAC provides a user-friendly tool to construct databases, based on publicly available genomes, that are used for competitive reads mapping. It then uses a novel Bayesian framework to infer the abundance and statistical support for each species identification and provide per-read species classification. Unlike other methods, HAYSTAC is specifically designed to efficiently handle both ancient and modern DNA data, as well as incomplete reference databases, making it possible to run highly accurate hypothesis-driven analyses (i.e., assessing the presence of a specific species) on variably sized reference databases while dramatically improving processing speeds. We tested the performance and accuracy of HAYSTAC using simulated Illumina libraries, both with and without ancient DNA damage, and compared the results to other currently available methods (i.e., Kraken2/Bracken, KrakenUniq, MALT/HOPS, and Sigma). HAYSTAC identified fewer false positives than both Kraken2/Bracken, KrakenUniq and MALT in all simulations, and fewer than Sigma in simulations of ancient data. It uses less memory than Kraken2/Bracken, KrakenUniq as well as MALT both during database construction and sample analysis. Lastly, we used HAYSTAC to search for specific pathogens in two published ancient metagenomic datasets, demonstrating how it can be applied to empirical datasets. HAYSTAC is available from https://github.com/antonisdim/HAYSTAC.
引用
收藏
页数:30
相关论文
共 55 条
[1]   Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance [J].
Ahn, Tae-Hyuk ;
Chai, Juanjuan ;
Pan, Chongle .
BIOINFORMATICS, 2015, 31 (02) :170-177
[2]   Origin of modern syphilis and emergence of a pandemic Treponema pallidum cluster [J].
Arora, Natasha ;
Schuenemann, Verena J. ;
Jaeger, Guenter ;
Peltzer, Alexander ;
Seitz, Alexander ;
Herbig, Alexander ;
Strouhal, Michal ;
Grillova, Linda ;
Sanchez-Buso, Leonor ;
Kuhnert, Denise ;
Bos, Kirsten I. ;
Davis, Leyla Rivero ;
Mikalova, Lenka ;
Bruisten, Sylvia ;
Komericki, Peter ;
French, Patrick ;
Grant, Paul R. ;
Pando, Maria A. ;
Gallo Vaulet, Lucia ;
Rodriguez Fermepin, Marcelo ;
Martinez, Antonio ;
Lara, Arturo Centurion ;
Giacani, Lorenzo ;
Norris, Steven J. ;
Smajs, David ;
Bosshard, Philipp P. ;
Gonzalez-Candelas, Fernando ;
Nieselt, Kay ;
Krause, Johannes ;
Bagheri, Homayoun C. .
NATURE MICROBIOLOGY, 2017, 2 (01)
[3]   Screening methods for detection of ancient Mycobacterium tuberculosis complex fingerprints in next-generation sequencing data derived from skeletal samples [J].
Borowka, Paulina ;
Pulaski, Lukasz ;
Marciniak, Blazej ;
Borowska-Struginska, Beata ;
Dziadek, Jaroslaw ;
Zadzinska, Elzbieta ;
Lorkiewicz, Wieslaw ;
Strapagiel, Dominik .
GIGASCIENCE, 2019, 8 (06) :1-14
[4]   A draft genome of Yersinia pestis from victims of the Black Death [J].
Bos, Kirsten I. ;
Schuenemann, Verena J. ;
Golding, G. Brian ;
Burbano, Hernan A. ;
Waglechner, Nicholas ;
Coombes, Brian K. ;
McPhee, Joseph B. ;
DeWitte, Sharon N. ;
Meyer, Matthias ;
Schmedes, Sarah ;
Wood, James ;
Earn, David J. D. ;
Herring, D. Ann ;
Bauer, Peter ;
Poinar, Hendrik N. ;
Krause, Johannes .
NATURE, 2011, 478 (7370) :506-510
[5]   KrakenUniq: confident and fast metagenomics classification using unique k-mer counts [J].
Breitwieser, F. P. ;
Baker, D. N. ;
Salzberg, S. L. .
GENOME BIOLOGY, 2018, 19
[6]   Human contamination in bacterial genomes has created thousands of spurious proteins [J].
Breitwieser, Florian P. ;
Pertea, Mihaela ;
Zimin, Aleksey V. ;
Salzberg, Steven L. .
GENOME RESEARCH, 2019, 29 (06) :954-960
[7]   Genomic characterization of the Yersinia genus [J].
Chen, Peter E. ;
Cook, Christopher ;
Stewart, Andrew C. ;
Nagarajan, Niranjan ;
Sommer, Dan D. ;
Pop, Mihai ;
Thomason, Brendan ;
Thomason, Maureen P. Kiley ;
Lentz, Shannon ;
Nolan, Nichole ;
Sozhamannan, Shanmuga ;
Sulakvelidze, Alexander ;
Mateczun, Alfred ;
Du, Lei ;
Zwick, Michael E. ;
Read, Timothy D. .
GENOME BIOLOGY, 2010, 11 (01)
[8]   Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data [J].
Davis, Nicole M. ;
Proctor, Diana M. ;
Holmes, Susan P. ;
Relman, David A. ;
Callahan, Benjamin J. .
MICROBIOME, 2018, 6
[9]  
De Keukeleire S, 2014, New Microbes New Infect, V2, P134, DOI 10.1002/nmi2.56
[10]   A probable prehistoric case of meningococcal disease from San Francisco Bay: Next generation sequencing of Neisseria meningitidis from dental calculus and osteological evidence [J].
Eerkens, Jelmer W. ;
Nichols, Ruth, V ;
Murray, Gemma G. R. ;
Perez, Katherine ;
Murga, Engel ;
Kaijankoski, Phil ;
Rosenthal, Jeffrey S. ;
Engbring, Laurel ;
Shapiro, Beth .
INTERNATIONAL JOURNAL OF PALEOPATHOLOGY, 2018, 22 :173-180