Emu: species-level microbial community profiling of full-length 16S rRNA Oxford Nanopore sequencing data

被引:159
作者
Curry, Kristen D. [1 ]
Wang, Qi [2 ]
Nute, Michael G. [1 ]
Tyshaieva, Alona [3 ]
Reeves, Elizabeth [1 ]
Soriano, Sirena [4 ]
Wu, Qinglong [5 ,6 ]
Graeber, Enid [3 ]
Finzer, Patrick [3 ]
Mendling, Werner [7 ]
Savidge, Tor [5 ,6 ]
Villapol, Sonia [4 ]
Dilthey, Alexander [3 ]
Treangen, Todd J. [1 ]
机构
[1] Rice Univ, Dept Comp Sci, Houston, TX 77251 USA
[2] Rice Univ, Dept Syst Synthet & Phys Biol Sci, Houston, TX 77251 USA
[3] Heinrich Heine Univ Dusseldorf, Inst Med Microbiol & Hosp Hyg, Dusseldorf, Germany
[4] Houston Methodist Res Inst, Ctr Neuroregenerat, Houston, TX USA
[5] Baylor Coll Med, Dept Pathol & Immunol, Houston, TX 77030 USA
[6] Texas Childrens Hosp, Dept Pathol, Texas Childrens Microbiome Ctr, Houston, TX USA
[7] Helios Univ Clin Wuppertal, German Ctr Infect Gynaecol & Obstet, Wuppertal, Germany
关键词
VAGINAL MICROBIOTA; GENE DATABASE; CLASSIFICATION; TOOLS;
D O I
10.1038/s41592-022-01520-4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
16S ribosomal RNA-based analysis is the established standard for elucidating the composition of microbial communities. While short-read 16S rRNA analyses are largely confined to genus-level resolution at best, given that only a portion of the gene is sequenced, full-length 16S rRNA gene amplicon sequences have the potential to provide species-level accuracy. However, existing taxonomic identification algorithms are not optimized for the increased read length and error rate often observed in long-read data. Here we present Emu, an approach that uses an expectation-maximization algorithm to generate taxonomic abundance profiles from full-length 16S rRNA reads. Results produced from simulated datasets and mock communities show that Emu is capable of accurate microbial community profiling while obtaining fewer false positives and false negatives than alternative methods. Additionally, we illustrate a real-world application of Emu by comparing clinical sample composition estimates generated by an established whole-genome shotgun sequencing workflow with those returned by full-length 16S rRNA gene sequences processed with Emu. Emu accurately estimates microbial abundance using full-length Nanopore 16S rRNA gene sequencing data.
引用
收藏
页码:845 / +
页数:17
相关论文
共 52 条
[1]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[2]   Species-level resolution of 16S rRNA gene amplicons sequenced through the MinION™ portable nanopore sequencer [J].
Benitez-Paez, Alfonso ;
Portune, Kevin J. ;
Sanz, Yolanda .
GIGASCIENCE, 2016, 5
[3]   Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 [J].
Bolyen, Evan ;
Rideout, Jai Ram ;
Dillon, Matthew R. ;
Bokulich, NicholasA. ;
Abnet, Christian C. ;
Al-Ghalith, Gabriel A. ;
Alexander, Harriet ;
Alm, Eric J. ;
Arumugam, Manimozhiyan ;
Asnicar, Francesco ;
Bai, Yang ;
Bisanz, Jordan E. ;
Bittinger, Kyle ;
Brejnrod, Asker ;
Brislawn, Colin J. ;
Brown, C. Titus ;
Callahan, Benjamin J. ;
Caraballo-Rodriguez, Andres Mauricio ;
Chase, John ;
Cope, Emily K. ;
Da Silva, Ricardo ;
Diener, Christian ;
Dorrestein, Pieter C. ;
Douglas, Gavin M. ;
Durall, Daniel M. ;
Duvallet, Claire ;
Edwardson, Christian F. ;
Ernst, Madeleine ;
Estaki, Mehrbod ;
Fouquier, Jennifer ;
Gauglitz, Julia M. ;
Gibbons, Sean M. ;
Gibson, Deanna L. ;
Gonzalez, Antonio ;
Gorlick, Kestrel ;
Guo, Jiarong ;
Hillmann, Benjamin ;
Holmes, Susan ;
Holste, Hannes ;
Huttenhower, Curtis ;
Huttley, Gavin A. ;
Janssen, Stefan ;
Jarmusch, Alan K. ;
Jiang, Lingjing ;
Kaehler, Benjamin D. ;
Bin Kang, Kyo ;
Keefe, Christopher R. ;
Keim, Paul ;
Kelley, Scott T. ;
Knights, Dan .
NATURE BIOTECHNOLOGY, 2019, 37 (08) :852-857
[4]   Near-optimal probabilistic RNA-seq quantification (vol 34, pg 525, 2016) [J].
Bray, Nicolas L. ;
Pimentel, Harold ;
Melsted, Pall ;
Pachter, Lior .
NATURE BIOTECHNOLOGY, 2016, 34 (08) :888-888
[5]   The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies [J].
Brooks, J. Paul ;
Edwards, David J. ;
Harwich, Michael D., Jr. ;
Rivera, Maria C. ;
Fettweis, Jennifer M. ;
Serrano, Myrna G. ;
Reris, Robert A. ;
Sheth, Nihar U. ;
Huang, Bernice ;
Girerd, Philippe ;
Strauss, Jerome F., III ;
Jefferson, Kimberly K. ;
Buck, Gregory A. .
BMC MICROBIOLOGY, 2015, 15
[6]   Ultra-accurate microbial amplicon sequencing with synthetic long reads [J].
Callahan, Benjamin J. ;
Grinevich, Dmitry ;
Thakur, Siddhartha ;
Balamotis, Michael A. ;
Ben Yehezkel, Tuval .
MICROBIOME, 2021, 9 (01)
[7]   High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution [J].
Callahan, Benjamin J. ;
Wong, Joan ;
Heiner, Cheryl ;
Oh, Steve ;
Theriot, Casey M. ;
Gulati, Ajay S. ;
McGill, Sarah K. ;
Dougherty, Michael K. .
NUCLEIC ACIDS RESEARCH, 2019, 47 (18)
[8]   GenBank [J].
Clark, Karen ;
Karsch-Mizrachi, Ilene ;
Lipman, David J. ;
Ostell, James ;
Sayers, Eric W. .
NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) :D67-D72
[9]   Ribosomal Database Project: data and tools for high throughput rRNA analysis [J].
Cole, James R. ;
Wang, Qiong ;
Fish, Jordan A. ;
Chai, Benli ;
McGarrell, Donna M. ;
Sun, Yanni ;
Brown, C. Titus ;
Porras-Alfaro, Andrea ;
Kuske, Cheryl R. ;
Tiedje, James M. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D633-D642
[10]   Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB [J].
DeSantis, T. Z. ;
Hugenholtz, P. ;
Larsen, N. ;
Rojas, M. ;
Brodie, E. L. ;
Keller, K. ;
Huber, T. ;
Dalevi, D. ;
Hu, P. ;
Andersen, G. L. .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2006, 72 (07) :5069-5072