Machine learning-based predictive modeling to identify genotypic traits associated with Salmonella enterica disease endpoints in isolates from ground chicken

被引:24
作者
Tanui, Collins K. [1 ,2 ]
Karanth, Shraddha [1 ]
Njage, Patrick M. K. [3 ]
Meng, Jianghong [1 ,2 ,4 ]
Pradhan, Abani K. [1 ,2 ]
机构
[1] Univ Maryland, Dept Nutr & Food Sci, 0112 Skinner Bldg, College Pk, MD 20742 USA
[2] Univ Maryland, Ctr Food Safety & Secur Syst, College Pk, MD 20742 USA
[3] Tech Univ Denmark, Natl Food Inst, Res Grp Genom Epidemiol, DK-2800 Lyngby, Denmark
[4] Univ Maryland, Joint Inst Food Safety & Appl Nutr, College Pk, MD 20742 USA
关键词
Predictive modeling; Machine learning; Whole genome sequencing; Salmonella; COMPLETE GENOME SEQUENCE; SEROVAR TYPHIMURIUM; ESCHERICHIA-COLI; UNITED-STATES; PATHOGENICITY; VIRULENCE; CLASSIFICATION; CAMPYLOBACTER; RESISTANCE; MULTIDRUG;
D O I
10.1016/j.lwt.2021.112701
中图分类号
TS2 [食品工业];
学科分类号
0832 ;
摘要
As the cost of genome sequencing of foodborne pathogens decreases, it has become possible to sequence a large number of isolates and evaluate those using machine learning algorithms. This study aimed to utilize machine learning algorithms to predict the disease endpoints in untagged Salmonella genome sequences isolated from ground chicken. Our models recognized genetic patterns in the test dataset based on our training dataset obtained from an extensive literature review, using a semi-supervised approach. Using known genotypes as input features, the semi-supervised random forest model showed the highest overall accuracy of 0.94 (95% confidence interval: 0.85-0.99), and a Kappa value of 0.82, and predicted 87% of the disease endpoints. The model predicted genes associated with specific disease endpoints that were associated with virulence, which could be used as features in predictive modeling endeavors in the future. Our machine learning approach would be useful in different areas of food safety, including identifying pathogen sources, predicting antibiotic resistance, and risk assessment of foodborne pathogens.
引用
收藏
页数:8
相关论文
共 84 条
[1]   Microbial bioinformatics for food safety and production [J].
Alkema, Wynand ;
Boekhorst, Jos ;
Wels, Michiel ;
van Hijum, Sacha A. F. T. .
BRIEFINGS IN BIOINFORMATICS, 2016, 17 (02) :283-292
[2]   DIAGNOSTIC-TESTS-2 - PREDICTIVE VALUES .4. [J].
ALTMAN, DG ;
BLAND, JM .
BRITISH MEDICAL JOURNAL, 1994, 309 (6947) :102-102
[3]   Variation between pathogenic serovars within Salmonella pathogenicity islands [J].
Amavisit, P ;
Lightfoot, D ;
Browning, GF ;
Markham, PF .
JOURNAL OF BACTERIOLOGY, 2003, 185 (12) :3624-3635
[4]  
Angelo KM, 2015, MMWR-MORBID MORTAL W, V64, P144
[5]   The RAST server: Rapid annotations using subsystems technology [J].
Aziz, Ramy K. ;
Bartels, Daniela ;
Best, Aaron A. ;
DeJongh, Matthew ;
Disz, Terrence ;
Edwards, Robert A. ;
Formsma, Kevin ;
Gerdes, Svetlana ;
Glass, Elizabeth M. ;
Kubal, Michael ;
Meyer, Folker ;
Olsen, Gary J. ;
Olson, Robert ;
Osterman, Andrei L. ;
Overbeek, Ross A. ;
McNeil, Leslie K. ;
Paarmann, Daniel ;
Paczian, Tobias ;
Parrello, Bruce ;
Pusch, Gordon D. ;
Reich, Claudia ;
Stevens, Rick ;
Vassieva, Olga ;
Vonstein, Veronika ;
Wilke, Andreas ;
Zagnitko, Olga .
BMC GENOMICS, 2008, 9 (1)
[6]   Mechanistic models versus machine learning, a fight worth fighting for the biological community? [J].
Baker, Ruth E. ;
Pena, Jose-Maria ;
Jayamohan, Jayaratnam ;
Jerusalem, Antoine .
BIOLOGY LETTERS, 2018, 14 (05)
[7]   SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing [J].
Bankevich, Anton ;
Nurk, Sergey ;
Antipov, Dmitry ;
Gurevich, Alexey A. ;
Dvorkin, Mikhail ;
Kulikov, Alexander S. ;
Lesin, Valery M. ;
Nikolenko, Sergey I. ;
Son Pham ;
Prjibelski, Andrey D. ;
Pyshkin, Alexey V. ;
Sirotkin, Alexander V. ;
Vyahhi, Nikolay ;
Tesler, Glenn ;
Alekseyev, Max A. ;
Pevzner, Pavel A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) :455-477
[8]   HilE interacts with HilD and negatively regulates hiLA transcription and expression of the Salmonella enterica serovar typhimurium invasive phenotype [J].
Baxter, MA ;
Fahlen, TF ;
Wilson, RL ;
Jones, BD .
INFECTION AND IMMUNITY, 2003, 71 (03) :1295-1305
[9]   Salmonella enterica serotype enteritidis and eggs:: A national epidemic in the United States [J].
Braden, Christopher R. .
CLINICAL INFECTIOUS DISEASES, 2006, 43 (04) :512-517
[10]  
Breiman L., 2001, IEEE Trans. Broadcast., V45, P5