Background: Currently, the naive Bayesian classifier provided by the Ribosomal Database Project (RDP) is one of the most widely used tools to classify 16S rRNA sequences, mainly collected from environmental samples. We show that RDP has 97+% assignment accuracy and is fast for 250 bp and longer reads when the read originates from a taxon known to the database. Because most environmental samples will contain organisms from taxa whose 16S rRNA genes have not been previously sequenced, we aim to benchmark how well the RDP classifier and other competing methods can discriminate these novel taxa from known taxa. Principal Findings: Because each fragment is assigned a score (containing likelihood or confidence information such as the boostrap score in the RDP classifier), we "train" a threshold to discriminate between novel and known organisms and observe its performance on a test set. The threshold that we determine tends to be conservative (low sensitivity but high specificity) for naive Bayesian methods. Nonetheless, our method performs better with the RDP classifier than the other methods tested, measured by the f-measure and the area-under-the-curve on the receiver operating characteristic of the test set. By constraining the database to well-represented genera, sensitivity improves 3-15%. Finally, we show that the detector is a good predictor to determine novel abundant taxa (especially for finer levels of taxonomy where novelty is more likely to be present). Conclusions: We conclude that selecting a read-length appropriate RDP bootstrap score can significantly reduce the search space for identifying novel genera and higher levels in taxonomy. In addition, having a well-represented database significantly improves performance while having genera that are "highly" similar does not make a significant improvement. On a real dataset from an Amazon Terra Preta soil sample, we show that the detector can predict (or correlates to) whether novel sequences will be assigned to new taxa when the RDP database "doubles" in the future.
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Cole, J. R.
;
Wang, Q.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Wang, Q.
;
Cardenas, E.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Cardenas, E.
;
Fish, J.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Dept Microbiol & Mol Genet, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Fish, J.
;
Chai, B.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Chai, B.
;
Farris, R. J.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Farris, R. J.
;
Kulam-Syed-Mohideen, A. S.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Kulam-Syed-Mohideen, A. S.
;
McGarrell, D. M.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
McGarrell, D. M.
;
论文数: 引用数:
h-index:
机构:
Marsh, T.
;
Garrity, G. M.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Dept Microbiol & Mol Genet, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Garrity, G. M.
;
Tiedje, J. M.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Michigan State Univ, Dept Microbiol & Mol Genet, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
机构:
Washington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Ley, Ruth E.
;
Hamady, Micah
论文数: 0引用数: 0
h-index: 0
机构:
Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Hamady, Micah
;
Lozupone, Catherine
论文数: 0引用数: 0
h-index: 0
机构:
Washington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Univ Colorado, Dept Chem & Biochem, Boulder, CO 80309 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Lozupone, Catherine
;
Turnbaugh, Peter J.
论文数: 0引用数: 0
h-index: 0
机构:
Washington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Turnbaugh, Peter J.
;
Ramey, Rob Roy
论文数: 0引用数: 0
h-index: 0
机构:
Wildlife Sci Int Inc, Nederland, CO 80466 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Ramey, Rob Roy
;
Bircher, J. Stephen
论文数: 0引用数: 0
h-index: 0
机构:
St Louis Zoo, St Louis, MO 63110 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Bircher, J. Stephen
;
Schlegel, Michael L.
论文数: 0引用数: 0
h-index: 0
机构:
Zool Soc San Diego, San Diego, CA 92112 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Schlegel, Michael L.
;
Tucker, Tammy A.
论文数: 0引用数: 0
h-index: 0
机构:
Zool Soc San Diego, San Diego, CA 92112 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Tucker, Tammy A.
;
Schrenzel, Mark D.
论文数: 0引用数: 0
h-index: 0
机构:
Zool Soc San Diego, San Diego, CA 92112 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Schrenzel, Mark D.
;
Knight, Rob
论文数: 0引用数: 0
h-index: 0
机构:
Univ Colorado, Dept Chem & Biochem, Boulder, CO 80309 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Knight, Rob
;
Gordon, Jeffrey I.
论文数: 0引用数: 0
h-index: 0
机构:
Washington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Cole, J. R.
;
Wang, Q.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Wang, Q.
;
Cardenas, E.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Cardenas, E.
;
Fish, J.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Dept Microbiol & Mol Genet, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Fish, J.
;
Chai, B.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Chai, B.
;
Farris, R. J.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Farris, R. J.
;
Kulam-Syed-Mohideen, A. S.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Kulam-Syed-Mohideen, A. S.
;
McGarrell, D. M.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
McGarrell, D. M.
;
论文数: 引用数:
h-index:
机构:
Marsh, T.
;
Garrity, G. M.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Dept Microbiol & Mol Genet, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Garrity, G. M.
;
Tiedje, J. M.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Michigan State Univ, Dept Microbiol & Mol Genet, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
机构:
Washington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Ley, Ruth E.
;
Hamady, Micah
论文数: 0引用数: 0
h-index: 0
机构:
Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Hamady, Micah
;
Lozupone, Catherine
论文数: 0引用数: 0
h-index: 0
机构:
Washington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Univ Colorado, Dept Chem & Biochem, Boulder, CO 80309 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Lozupone, Catherine
;
Turnbaugh, Peter J.
论文数: 0引用数: 0
h-index: 0
机构:
Washington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Turnbaugh, Peter J.
;
Ramey, Rob Roy
论文数: 0引用数: 0
h-index: 0
机构:
Wildlife Sci Int Inc, Nederland, CO 80466 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Ramey, Rob Roy
;
Bircher, J. Stephen
论文数: 0引用数: 0
h-index: 0
机构:
St Louis Zoo, St Louis, MO 63110 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Bircher, J. Stephen
;
Schlegel, Michael L.
论文数: 0引用数: 0
h-index: 0
机构:
Zool Soc San Diego, San Diego, CA 92112 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Schlegel, Michael L.
;
Tucker, Tammy A.
论文数: 0引用数: 0
h-index: 0
机构:
Zool Soc San Diego, San Diego, CA 92112 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Tucker, Tammy A.
;
Schrenzel, Mark D.
论文数: 0引用数: 0
h-index: 0
机构:
Zool Soc San Diego, San Diego, CA 92112 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Schrenzel, Mark D.
;
Knight, Rob
论文数: 0引用数: 0
h-index: 0
机构:
Univ Colorado, Dept Chem & Biochem, Boulder, CO 80309 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA
Knight, Rob
;
Gordon, Jeffrey I.
论文数: 0引用数: 0
h-index: 0
机构:
Washington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USAWashington Univ, Sch Med, Ctr Genome Sci, St Louis, MO 63108 USA