Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2′s q2-feature-classifier plugin

被引:3930
作者
Bokulich, Nicholas A. [1 ]
Kaehler, Benjamin D. [2 ]
Rideout, Jai Ram [1 ]
Dillon, Matthew [1 ]
Bolyen, Evan [1 ]
Knight, Rob [3 ,4 ,5 ]
Huttley, Gavin A. [2 ]
Caporaso, J. Gregory [1 ,6 ]
机构
[1] No Arizona Univ, Pathogen & Microbiome Inst, POB 4073, Flagstaff, AZ 86011 USA
[2] Australian Natl Univ, Res Sch Biol, 46 Sullivans Creek Rd, Acton, ACT 2601, Australia
[3] Univ Calif San Diego, Dept Pediat, La Jolla, CA 92093 USA
[4] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
[5] Univ Calif San Diego, Ctr Microbiome Innovat, La Jolla, CA 92093 USA
[6] No Arizona Univ, Dept Biol Sci, 1298 S Knoles Dr,Bldg 56,3rd Floor, Flagstaff, AZ USA
基金
美国国家科学基金会; 英国医学研究理事会;
关键词
BAYESIAN CLASSIFIER; PRIMERS; IDENTIFICATION; DIVERSITY; SELECTION;
D O I
10.1186/s40168-018-0470-z
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Background: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. Results: We present q2-feature-classifier (https://github.corn/qiime2/q2 feature classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit (https://github.comkaporaso lab/tax credit data). Conclusions: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.
引用
收藏
页数:17
相关论文
共 39 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]  
[Anonymous], BIOINFORMATICS
[3]  
[Anonymous], SINTAX SIMPLE NONBAY
[4]   mockrobiota: a Public Resource for Microbiome Bioinformatics Benchmarking [J].
Bokulich, Nicholas A. ;
Rideout, Jai Ram ;
Mercurio, William G. ;
Shiffer, Arron ;
Wolfe, Benjamin ;
Maurice, Corinne F. ;
Dutton, Rachel J. ;
Turnbaugh, Peter J. ;
Knight, Rob ;
Caporaso, J. Gregory .
MSYSTEMS, 2016, 1 (05)
[5]   Improved Selection of Internal Transcribed Spacer-Specific Primers Enables Quantitative, Ultra-High-Throughput Profiling of Fungal Communities [J].
Bokulich, Nicholas A. ;
Mills, David A. .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2013, 79 (08) :2519-2526
[6]  
Bokulich NA, 2013, NAT METHODS, V10, P57, DOI [10.1038/NMETH.2276, 10.1038/nmeth.2276]
[7]   AN ORDINATION OF THE UPLAND FOREST COMMUNITIES OF SOUTHERN WISCONSIN [J].
BRAY, JR ;
CURTIS, JT .
ECOLOGICAL MONOGRAPHS, 1957, 27 (04) :326-349
[8]  
Buitinck L., 2013, ECML PKDD WORKSHOP L, P108, DOI DOI 10.48550/ARXIV.1309.0238
[9]  
Callahan BJ, 2016, NAT METHODS, V13, P581, DOI [10.1038/NMETH.3869, 10.1038/nmeth.3869]
[10]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10