Comprehensive assessment of the quality of Salmonella whole genome sequence data available in public sequence databases using the Salmonella in silico Typing Resource (SISTR)

被引:34
作者
Robertson, James [1 ]
Yoshida, Catherine [2 ]
Kruczkiewicz, Peter [2 ]
Nadon, Celine [2 ]
Nichani, Anil [1 ]
Taboada, Eduardo N. [3 ]
Nash, John Howard Eagles [4 ]
机构
[1] Publ Hlth Agcy Canada, Natl Microbiol Lab, Guelph, ON, Canada
[2] Publ Hlth Agcy Canada, Natl Microbiol Lab, Winnipeg, MB, Canada
[3] Publ Hlth Agcy Canada, Natl Microbiol Lab, Lethbridge, AB, Canada
[4] Publ Hlth Agcy Canada, Natl Microbiol Lab, Toronto, ON, Canada
关键词
Salmonella; Public Health; whole genome sequencing; serotyping; surveillance; phenotype prediction; BY-GENE APPROACH; GENBANK; HEALTH; MLST;
D O I
10.1099/mgen.0.000151
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Public health and food safety institutions around the world are adopting whole genome sequencing (WGS) to replace conventional methods for characterizing Salmonella for use in surveillance and outbreak response. Falling costs and increased throughput of WGS have resulted in an explosion of data, but questions remain as to the reliability and robustness of the data. Due to the critical importance of serovar information to public health, it is essential to have reliable serovar assignments available for all of the Salmonella records. The current study used a systematic assessment and curation of all Salmonella in the sequence read archive (SRA) to assess the state of the data and their utility. A total of 67 758 genomes were assembled de novo and quality-assessed for their assembly metrics as well as species and serovar assignments. A total of 42 400 genomes passed all of the quality criteria but 30.16% of genomes were deposited without serotype information. These data were used to compare the concordance of reported and predicted serovars for two in silico prediction tools, multi-locus sequence typing (MLST) and the Salmonella in silico Typing Resource (SISTR), which produced predictions that were fully concordant with 87.51 and 91.91% of the tested isolates, respectively. Concordance of in silico predictions increased when serovar variants were grouped together, 89.25% for MLST and 94.98% for SISTR. This study represents the first large-scale validation of serovar information in public genomes and provides a large validated set of genomes, which can be used to benchmark new bioinformatics tools.
引用
收藏
页数:11
相关论文
共 28 条
[1]   Multilocus Sequence Typing as a Replacement for Serotyping in Salmonella enterica [J].
Achtman, Mark ;
Wain, John ;
Weill, Francois-Xavier ;
Nair, Satheesh ;
Zhou, Zhemin ;
Sangal, Vartul ;
Krauland, Mary G. ;
Hale, James L. ;
Harbottle, Heather ;
Uesbeck, Alexandra ;
Dougan, Gordon ;
Harrison, Lee H. ;
Brisse, Sylvain .
PLOS PATHOGENS, 2012, 8 (06)
[2]   The Future of Whole-Genome Sequencing for Public Health and the Clinic [J].
Allard, Marc W. .
JOURNAL OF CLINICAL MICROBIOLOGY, 2016, 54 (08) :1946-1948
[3]   At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies [J].
Ashelford, KE ;
Chuzhanova, NA ;
Fry, JC ;
Jones, AJ ;
Weightman, AJ .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2005, 71 (12) :7724-7736
[4]   Identification of Salmonella for public health surveillance using whole genome sequencing [J].
Ashton, Philip M. ;
Nair, Satheesh ;
Peters, Tansy M. ;
Bale, Janet A. ;
Powell, David G. ;
Painset, Anais ;
Tewolde, Rediat ;
Schaefer, Ulf ;
Jenkins, Claire ;
Dallman, Timothy J. ;
de Pinna, Elizabeth M. ;
Grant, Kathie A. .
PEERJ, 2016, 4
[5]   Salmonella nomenclature - Guest commentary [J].
Brenner, FW ;
Villar, RG ;
Angulo, FJ ;
Tauxe, R ;
Swaminathan, B .
JOURNAL OF CLINICAL MICROBIOLOGY, 2000, 38 (07) :2465-2467
[6]   "COI-LIKE" SEQUENCES ARE BECOMING PROBLEMATIC IN MOLECULAR SYSTEMATIC AND DNA BARCODING STUDIES [J].
Buhay, Jennifer E. .
JOURNAL OF CRUSTACEAN BIOLOGY, 2009, 29 (01) :96-110
[7]   A multi-country outbreak of Salmonella Newport gastroenteritis in Europe associated with watermelon from Brazil, confirmed by whole genome sequencing: October 2011 to January 2012 [J].
Byrne, L. ;
Fisher, I. ;
Peters, T. ;
Mather, A. ;
Thomson, N. ;
Rosner, B. ;
Bernard, H. ;
McKeown, P. ;
Cormican, M. ;
Cowden, J. ;
Aiyedun, V. ;
Lane, C. .
EUROSURVEILLANCE, 2014, 19 (31) :6-13
[8]  
Centers for Disease Control and Prevention (CDC), 2016, MULT OUTBR SALM MONT
[9]   Meeting report: GenBank microbial genomic taxonomy workshop (12-13 May, 2015) [J].
Federhen, Scott ;
Rossello-Mora, Ramon ;
Klenk, Hans-Peter ;
Tindall, Brian J. ;
Konstantinidis, Konstantinos T. ;
Whitman, William B. ;
Brown, Daniel ;
Labeda, David ;
Ussery, David ;
Garrity, George M. ;
Colwell, Rita R. ;
Hasan, Nur ;
Graf, Joerg ;
Parte, Aidan ;
Yarza, Pablo ;
Goldberg, Brittany ;
Sichtig, Heike ;
Karsch-Mizrachi, Ilene ;
Clark, Karen ;
McVeigh, Richard ;
Pruitt, Kim D. ;
Tatusova, Tatiana ;
Falk, Robert ;
Turner, Sean ;
Madden, Thomas ;
Kitts, Paul ;
Kimchi, Avi ;
Klimke, William ;
Agarwala, Richa ;
DiCuccio, Michael ;
Ostell, James .
STANDARDS IN GENOMIC SCIENCES, 2016, 11
[10]   The minimum information about a genome sequence (MIGS) specification [J].
Field, Dawn ;
Garrity, George ;
Gray, Tanya ;
Morrison, Norman ;
Selengut, Jeremy ;
Sterk, Peter ;
Tatusova, Tatiana ;
Thomson, Nicholas ;
Allen, Michael J. ;
Angiuoli, Samuel V. ;
Ashburner, Michael ;
Axelrod, Nelson ;
Baldauf, Sandra ;
Ballard, Stuart ;
Boore, Jeffrey ;
Cochrane, Guy ;
Cole, James ;
Dawyndt, Peter ;
De Vos, Paul ;
dePamphilis, Claude ;
Edwards, Robert ;
Faruque, Nadeem ;
Feldman, Robert ;
Gilbert, Jack ;
Gilna, Paul ;
Gloeckner, Frank Oliver ;
Goldstein, Philip ;
Guralnick, Robert ;
Haft, Dan ;
Hancock, David ;
Hermjakob, Henning ;
Hertz-Fowler, Christiane ;
Hugenholtz, Phil ;
Joint, Ian ;
Kagan, Leonid ;
Kane, Matthew ;
Kennedy, Jessie ;
Kowalchuk, George ;
Kottmann, Renzo ;
Kolker, Eugene ;
Kravitz, Saul ;
Kyrpides, Nikos ;
Leebens-Mack, Jim ;
Lewis, Suzanna E. ;
Li, Kelvin ;
Lister, Allyson L. ;
Lord, Phillip ;
Maltsev, Natalia ;
Markowitz, Victor ;
Martiny, Jennifer .
NATURE BIOTECHNOLOGY, 2008, 26 (05) :541-547