Naïve Bayes Classifiers and accompanying dataset for Pseudomonas syringae isolate characterization
被引:2
作者:
Fautt, Chad
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Dept Plant Pathol & Environm Microbiol, University Pk, PA 16802 USA
Penn State Univ, Dept Ecosyst Sci & Management, University Pk, PA 16802 USA
Penn State Univ, Intercoll Grad Degree Program Ecol, University Pk, PA 16802 USAPenn State Univ, Dept Plant Pathol & Environm Microbiol, University Pk, PA 16802 USA
Fautt, Chad
[1
,2
,3
]
Couradeau, Estelle
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Dept Ecosyst Sci & Management, University Pk, PA 16802 USA
Penn State Univ, Intercoll Grad Degree Program Ecol, University Pk, PA 16802 USAPenn State Univ, Dept Plant Pathol & Environm Microbiol, University Pk, PA 16802 USA
Couradeau, Estelle
[2
,3
]
Hockett, Kevin L.
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Dept Plant Pathol & Environm Microbiol, University Pk, PA 16802 USA
Penn State Univ, Intercoll Grad Degree Program Ecol, University Pk, PA 16802 USAPenn State Univ, Dept Plant Pathol & Environm Microbiol, University Pk, PA 16802 USA
Hockett, Kevin L.
[1
,3
]
机构:
[1] Penn State Univ, Dept Plant Pathol & Environm Microbiol, University Pk, PA 16802 USA
[2] Penn State Univ, Dept Ecosyst Sci & Management, University Pk, PA 16802 USA
[3] Penn State Univ, Intercoll Grad Degree Program Ecol, University Pk, PA 16802 USA
The Pseudomonas syringae species complex (PSSC) is a diverse group of plant pathogens with a collective host range encompassing almost every food crop grown today. As a threat to global food security, rapid detection and characterization of epidemic and emerging pathogenic lineages is essential. However, phylogenetic identification is often complicated by an unclarified and ever-changing taxonomy, making practical use of available databases and the proper training of classifiers difficult. As such, while amplicon sequencing is a common method for routine identification of PSSC isolates, there is no efficient method for accurate classification based on this data. Here we present a suite of five Naive bayes classifiers for PCR primer sets widely used for PSSC identification, trained on in-silico amplicon data from 2,161 published PSSC genomes using the life identification number (LIN) hierarchical clustering algorithm in place of traditional Linnaean taxonomy. Additionally, we include a dataset for translating classification results back into traditional taxonomic nomenclature (i.e. species, phylogroup, pathovar), and for predicting virulence factor repertoires.