A machine-learning approach to detecting unknown bacterial serovars

被引:12
作者
Akova F. [1 ]
Dundar M. [1 ]
Davisson V.J. [2 ,3 ]
Hirleman E.D. [4 ]
Bhunia A.K. [5 ]
Robinson J.P. [3 ]
Rajwa B. [3 ]
机构
[1] Department of Computer and Information Science, Indiana University-Purdue University, Indianapolis
[2] Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, W. Lafayette
[3] Bindley Bioscience Center, Purdue University, W. Lafayette
[4] School of Mechanical Engineering, Purdue University, W. Lafayette
[5] Department of Food Science, Purdue University, W. Lafayette
来源
Statistical Analysis and Data Mining | 2010年 / 3卷 / 05期
关键词
Anomaly detection; Bayesian classifier; Nonexhaustive training data; Novelty detection;
D O I
10.1002/sam.10085
中图分类号
学科分类号
摘要
Technologies for rapid detection of bacterial pathogens are crucial for securing the food supply. A light-scattering sensor recently developed for real-time identification of multiple colonies has shown great promise for distinguishing bacteria cultures. The classification approach currently used with this system relies on supervised learning. For accurate classification of bacterial pathogens, the training library should be exhaustive, i.e., should consist of samples of all possible pathogens. Yet, the sheer number of existing bacterial serovars and more importantly the effect of their high mutation rate would not allow for a practical and manageable training. In this study, we propose a Bayesian approach to learning with a nonexhaustive training dataset for automated detection of unknown bacterial serovars, i.e., serovars for which no samples exist in the training library. The main contribution of our work is the Wishart conjugate priors defined over class distributions. This allows us to employ the prior information obtained from known classes to make inferences about unknown classes as well. By this means, we identify new classes of informational value and dynamically update the training dataset with these classes to make it increasingly more representative of the sample population. This results in a classifier with improved predictive performance for future samples. We evaluated our approach on a 28-class bacteria dataset and also on the benchmark 26-class letter recognition dataset for further validation. The proposed approach is compared against state-of-the-art involving density-based approaches and support vector domain description, as well as a recently introduced Bayesian approach based on simulated classes. © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 289-301, 2010 Copyright © 2010.
引用
收藏
页码:289 / 301
页数:12
相关论文
共 31 条
[1]  
Klein E., Smith D.L., Laxminarayan R., Hospitalizations and deaths caused by methicillin-resistant Staphylococcus aureus, United States, 1999-2005, Emerg Infect Dis, 13, 12, pp. 1840-1846, (2007)
[2]  
Heaton J., Jones K., Microbial contamination of fruit and vegetables and the behaviour of enteropathogens in the phyllosphere: a review, J Appl Microbiol, 104, 3, pp. 613-626, (2008)
[3]  
Jay M.T., Cooley M., Carychao D., Et al., Escherichia coli O157:H7 in feral swine near spinach fields and cattle, central california coast, Emerg Infect Dis, 13, 12, pp. 1908-1911, (2007)
[4]  
Gerner-Smidt P., Whichard J.M., Foodborne disease trends and reports, Foodborne Pathogens Dis, 6, 1, pp. 1-5, (2009)
[5]  
Multistate outbreak of Salmonella infections associated with peanut butter and peanut butter-containing products-United States, 2008-2009, Morb Mortal Wkly Rep, 58, 4, pp. 85-90, (2009)
[6]  
Swaminathan B., Gerner-Smidt P., The epidemiology of human listeriosis,, Microbes and Infection, 9, pp. 1236-1243, (2007)
[7]  
Ligler F., Taitt C., Shriver-Lake L., Sapsford K., Shubin Y., Golden J., Array biosensor for detection of toxins, Anal Bioanal Chem, 377, 3, pp. 469-477, (2003)
[8]  
Lim D.V., Simpson J.M., Kearns E.A., Kramer M.F., Current and developing technologies for monitoring agents of bioterrorism and biowarfare, Clin Microbiol Rev, 18, 4, pp. 583-607, (2005)
[9]  
Manning L., Baines R., Chadd S., Deliberate contamination of the food supply chain, Brit Food J, 107, 4, pp. 225-245, (2005)
[10]  
Relman D.A., Choffnes E., Lemon S.M., In search of biosecurity, Science, 311, 5769, (2006)