SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data

被引:84
|
作者
Epping, Lennard [1 ,2 ]
van Tonder, Andries J. [3 ]
Gladstone, Rebecca A. [3 ]
Bentley, Stephen D. [3 ]
Page, Andrew J. [1 ,4 ]
Keane, Jacqueline A. [1 ]
机构
[1] Wellcome Sanger Inst, Pathogen Informat, Hinxton CB10 1SA, Cambs, England
[2] Robert Koch Inst, Microbial Genom, Berlin, Germany
[3] Wellcome Sanger Inst, Infect Genom, Hinxton CB10 1SA, Cambs, England
[4] Norwich Res Pk, Quadram Inst, Norwich, Norfolk, England
来源
MICROBIAL GENOMICS | 2018年 / 4卷 / 07期
基金
英国惠康基金;
关键词
Streptococcus pneumoniae; serotyping; pneumococcal; whole genome sequencing; k-mer method; PNEUMOCOCCAL DISEASE; VACCINATION; DISCOVERY; CHILDREN; LOCUS; PCR;
D O I
10.1099/mgen.0.000186
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Streptococcus pneumoniae is responsible for 240 000-460 000 deaths in children under 5 years of age each year. Accurate identification of pneumococcal serotypes is important for tracking the distribution and evolution of serotypes following the introduction of effective vaccines. Recent efforts have been made to infer serotypes directly from genomic data but current software approaches are limited and do not scale well. Here, we introduce a novel method, SeroBA, which uses a k-mer approach. We compare SeroBA against real and simulated data and present results on the concordance and computational performance against a validation dataset, the robustness and scalability when analysing a large dataset, and the impact of varying the depth of coverage on sequence-based serotyping. SeroBA can predict serotypes, by identifying the cps locus, directly from raw whole genome sequencing read data with 98 % concordance using a k-mer-based method, can process 10 000 samples in just over 1 day using a standard server and can call serotypes at a coverage as low as 15-21x. SeroBA is implemented in Python3 and is freely available under an open source GPLv3 licence from: https://github.com/sangerpathogens/seroba
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Generate gene expression profile from high-throughput sequencing data
    Liu, Hui
    Jiang, Zhichao
    Fang, Xiangzhong
    Fu, Hanjiang
    Zheng, Xiaofei
    Cha, Lei
    Li, Wuju
    FRONTIERS OF MATHEMATICS IN CHINA, 2011, 6 (06) : 1131 - 1145
  • [32] ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
    Heller, David
    Krestel, Ralf
    Ohler, Uwe
    Vingron, Martin
    Marsico, Annalisa
    NUCLEIC ACIDS RESEARCH, 2017, 45 (19) : 11004 - 11018
  • [33] High-throughput sequencing of Bacillus anthracis in France: investigating genome diversity and population structure using whole-genome SNP discovery
    Girault, Guillaume
    Blouin, Yann
    Vergnaud, Gilles
    Derzelle, Sylviane
    BMC GENOMICS, 2014, 15
  • [34] High-throughput sequencing of Bacillus anthracis in France: investigating genome diversity and population structure using whole-genome SNP discovery
    Guillaume Girault
    Yann Blouin
    Gilles Vergnaud
    Sylviane Derzelle
    BMC Genomics, 15
  • [35] AmpliVar: Mutation Detection in High-Throughput Sequence from Amplicon-Based Libraries
    Hsu, Arthur L.
    Kondrashova, Olga
    Lunke, Sebastian
    Love, Clare J.
    Meldrum, Cliff
    Marquis-Nicholson, Renate
    Corboy, Greg
    Kym Pham
    Wakefield, Matthew
    Waring, Paul M.
    Taylor, Graham R.
    HUMAN MUTATION, 2015, 36 (04) : 411 - 418
  • [36] Epidemiological investigation of Pseudomonas aeruginosa isolates from a six-year-long hospital outbreak using high-throughput whole genome sequencing
    Snyder, L. A.
    Loman, N. J.
    Faraj, L. A.
    Levi, K.
    Weinstock, G.
    Boswell, T. C.
    Pallen, M. J.
    Ala'Aldeen, D. A.
    EUROSURVEILLANCE, 2013, 18 (42): : 17 - 25
  • [37] Viroscope: Plant viral diagnosis from high-throughput sequencing data using biologically-informed genome assembly coverage
    Valenzuela, Sandro L.
    Norambuena, Tomas
    Morgante, Veronica
    Garcia, Francisca
    Jimenez, Juan C.
    Nunez, Carlos
    Fuentes, Ignacia
    Pollak, Bernardo
    FRONTIERS IN MICROBIOLOGY, 2022, 13
  • [38] High-Throughput Screening Approach for Nanoporous Materials Genome Using Topological Data Analysis: Application to Zeolites
    Lee, Yongjin
    Barthel, Senja D.
    Dlotko, Pawel
    Moosavi, Seyed Mohamad
    Hess, Kathryn
    Smit, Berend
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2018, 14 (08) : 4427 - 4437
  • [39] Implementation of a high-throughput whole genome sequencing approach with the goal of maximizing efficiency and cost effectiveness to improve public health
    Dickinson, Michelle C.
    Wirth, Samantha E.
    Baker, Deborah
    Kidney, Anna M.
    Mitchell, Kara K.
    Nazarian, Elizabeth J.
    Shudt, Matthew
    Thompson, Lisa M.
    Venkata, Sai Laxmi Gubbala
    Musser, Kimberlee A.
    Mingle, Lisa
    MICROBIOLOGY SPECTRUM, 2024, 12 (04):
  • [40] Recent Innovations and Technical Advances in High-Throughput Parallel Single-Cell Whole-Genome Sequencing Methods
    Qiao, Yi
    Cheng, Tianguang
    Miao, Zikun
    Cui, Yue
    Tu, Jing
    SMALL METHODS, 2024,