SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data

被引:84
|
作者
Epping, Lennard [1 ,2 ]
van Tonder, Andries J. [3 ]
Gladstone, Rebecca A. [3 ]
Bentley, Stephen D. [3 ]
Page, Andrew J. [1 ,4 ]
Keane, Jacqueline A. [1 ]
机构
[1] Wellcome Sanger Inst, Pathogen Informat, Hinxton CB10 1SA, Cambs, England
[2] Robert Koch Inst, Microbial Genom, Berlin, Germany
[3] Wellcome Sanger Inst, Infect Genom, Hinxton CB10 1SA, Cambs, England
[4] Norwich Res Pk, Quadram Inst, Norwich, Norfolk, England
来源
MICROBIAL GENOMICS | 2018年 / 4卷 / 07期
基金
英国惠康基金;
关键词
Streptococcus pneumoniae; serotyping; pneumococcal; whole genome sequencing; k-mer method; PNEUMOCOCCAL DISEASE; VACCINATION; DISCOVERY; CHILDREN; LOCUS; PCR;
D O I
10.1099/mgen.0.000186
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Streptococcus pneumoniae is responsible for 240 000-460 000 deaths in children under 5 years of age each year. Accurate identification of pneumococcal serotypes is important for tracking the distribution and evolution of serotypes following the introduction of effective vaccines. Recent efforts have been made to infer serotypes directly from genomic data but current software approaches are limited and do not scale well. Here, we introduce a novel method, SeroBA, which uses a k-mer approach. We compare SeroBA against real and simulated data and present results on the concordance and computational performance against a validation dataset, the robustness and scalability when analysing a large dataset, and the impact of varying the depth of coverage on sequence-based serotyping. SeroBA can predict serotypes, by identifying the cps locus, directly from raw whole genome sequencing read data with 98 % concordance using a k-mer-based method, can process 10 000 samples in just over 1 day using a standard server and can call serotypes at a coverage as low as 15-21x. SeroBA is implemented in Python3 and is freely available under an open source GPLv3 licence from: https://github.com/sangerpathogens/seroba
引用
收藏
页数:6
相关论文
共 50 条
  • [41] A comparative analysis of tissue gene expression data from high-throughput studies
    Ping Jie
    Wang YaJun
    Yu Yao
    Li YiXue
    Li Xuan
    Hao Pei
    CHINESE SCIENCE BULLETIN, 2012, 57 (22): : 2920 - 2927
  • [42] WGSSAT: A High-Throughput Computational Pipeline for Mining and Annotation of SSR Markers From Whole Genomes
    Pandey, Manmohan
    Kumar, Ravindra
    Srivastava, Prachi
    Agarwal, Suyash
    Srivastava, Shreya
    Nagpure, Naresh S.
    Jena, Joy K.
    Kushwaha, Basdeo
    JOURNAL OF HEREDITY, 2018, 109 (03) : 339 - 343
  • [43] Microsatellite markers for the dinoflagellate Gambierdiscus caribaeus from high-throughput sequencing data
    Sassenhagen, Ingrid
    Erdner, Deana L.
    JOURNAL OF APPLIED PHYCOLOGY, 2017, 29 (04) : 1927 - 1932
  • [44] ALPHLARD: a Bayesian method for analyzing HLA genes from whole genome sequence data
    Shuto Hayashi
    Rui Yamaguchi
    Shinichi Mizuno
    Mitsuhiro Komura
    Satoru Miyano
    Hidewaki Nakagawa
    Seiya Imoto
    BMC Genomics, 19
  • [45] ALPHLARD: a Bayesian method for analyzing HLA genes from whole genome sequence data
    Hayashi, Shuto
    Yamaguchi, Rui
    Mizuno, Shinichi
    Komura, Mitsuhiro
    Miyano, Satoru
    Nakagawa, Hidewaki
    Imoto, Seiya
    BMC GENOMICS, 2018, 19
  • [46] Genetic Characterisation of Malawian Pneumococci Prior to the Roll-Out of the PCV13 Vaccine Using a High-Throughput Whole Genome Sequencing Approach
    Everett, Dean B.
    Cornick, Jennifer
    Denis, Brigitte
    Chewapreecha, Claire
    Croucher, Nicholas
    Harris, Simon
    Parkhill, Julian
    Gordon, Stephen
    Carrol, Enitan D.
    French, Neil
    Heyderman, Robert S.
    Bentley, Stephen D.
    PLOS ONE, 2012, 7 (09):
  • [47] Analysis of Simple Sequence Repeat of Three Kinds of Lamb Mixed Genome Sequences Based on High-throughput Prediction Technology
    Geng D.
    Luo R.
    Wang L.
    Gao S.
    Song Y.
    Journal of Chinese Institute of Food Science and Technology, 2020, 20 (09) : 275 - 285
  • [48] SEQUENCE TYPES AND ANTIMICROBIAL SUSCEPTIBILITY OF INVASIVE STREPTOCOCCUS PNEUMONIAE ISOLATES FROM A REGION WITH HIGH ANTIBIOTIC SELECTIVE PRESSURE AND SUBOPTIMAL VACCINE COVERAGE
    Janapatla, Rajendra-Prasad
    Hsu, Mei-Hua
    Du, Jia-Fu
    Hsieh, Yu-Chia
    Lin, Tzou-Yien
    Chiu, Cheng-Hsun
    PEDIATRIC INFECTIOUS DISEASE JOURNAL, 2010, 29 (05) : 467 - 469
  • [49] Rapid, high-throughput, cost-effective whole-genome sequencing of SARS-CoV-2 using a condensed library preparation of the Illumina DNA Prep kit
    Hickman, Rebecca
    Nguyen, Jason
    Lee, Tracy D.
    Tyson, John R.
    Azana, Robert
    Tsang, Frankie
    Hoang, Linda
    Prystajecky, Natalie A.
    JOURNAL OF CLINICAL MICROBIOLOGY, 2024, 62 (03)
  • [50] Whole-Genome Sequencing of Chinese Yellow Catfish Provides a Valuable Genetic Resource for High-Throughput Identification of Toxin Genes
    Zhang, Shiyong
    Li, Jia
    Qin, Qin
    Liu, Wei
    Bian, Chao
    Yi, Yunhai
    Wang, Minghua
    Zhong, Liqiang
    You, Xinxin
    Tang, Shengkai
    Liu, Yanshan
    Huang, Yu
    Gu, Ruobo
    Xu, Junmin
    Bian, Wenji
    Shi, Qiong
    Chen, Xiaohui
    TOXINS, 2018, 10 (12)