Staphylococcus aureus viewed from the perspective of 40,000+genomes

被引:73
作者
Petit, Robert A., III [1 ]
Read, Timothy D. [1 ]
机构
[1] Emory Univ, Dept Med, Div Infect Dis, Sch Med, Atlanta, GA 30322 USA
基金
美国国家卫生研究院;
关键词
Database; MSSA; MRSA; Antibiotic resistance; MLST; S; aureus; CASSETTE CHROMOSOME MEC; ANTIBIOTIC-RESISTANCE; GENOME ANALYSIS; ANNOTATION; ALGORITHM; DATABASE; GENES; TREE; CCR;
D O I
10.7717/peerj.5261
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Low-cost Illumina sequencing of clinically-important bacterial pathogens has generated thousands of publicly available genomic datasets. Analyzing these genomes and extracting relevant information for each pathogen and the associated clinical phenotypes requires not only resources and bioinformatic skills but organism-specific knowledge. In light of these issues, we created Staphopia, an analysis pipeline, database and application programming interface, focused on Staphylococcus aureus, a common colonizer of humans and a major antibiotic-resistant pathogen responsible for a wide spectrum of hospital and community-associated infections. Written in Python, Staphopia's analysis pipeline consists of submodules running open-source tools. It accepts raw FASTQ reads as an input, which undergo quality control filtration, error correction and reduction to a maximum of approximately 100 x chromosome coverage. This reduction significantly reduces total runtime without detrimentally affecting the results. The pipeline performs de novo assembly-based and mapping-based analysis. Automated gene calling and annotation is performed on the assembled contigs. Read-mapping is used to call variants (single nucleotide polymorphisms and insertion/deletions) against a reference S. aureus chromosome (N315, ST5). We ran the analysis pipeline on more than 43,000 S. aureus shotgun Illumina genome projects in the public European Nucleotide Archive database in November 2017. We found that only a quarter of known multi-locus sequence types (STs) were represented but the top 10 STs made up 70% of all genomes. methicillin-resistant S. aureus (MRSA) were 64% of all genomes. Using the Staphopia database we selected 380 high quality genomes deposited with good metadata, each from a different multi-locus ST, as a non-redundant diversity set for studying S. aureus evolution. In addition to answering basic science questions, Staphopia could serve as a potential platform for rapid clinical diagnostics of S. aureus isolates in the future. The system could also be adapted as a template for other organism-specific databases.
引用
收藏
页数:20
相关论文
共 44 条
[1]  
[Anonymous], 2010, Entrez Programming Utilities Help
[2]  
Antipov D, 2016, BIORXIV, DOI [DOI 10.1101/048942, 10.1101/048942]
[3]   SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing [J].
Bankevich, Anton ;
Nurk, Sergey ;
Antipov, Dmitry ;
Gurevich, Alexey A. ;
Dvorkin, Mikhail ;
Kulikov, Alexander S. ;
Lesin, Valery M. ;
Nikolenko, Sergey I. ;
Son Pham ;
Prjibelski, Andrey D. ;
Pyshkin, Alexey V. ;
Sirotkin, Alexander V. ;
Vyahhi, Nikolay ;
Tesler, Glenn ;
Alekseyev, Max A. ;
Pevzner, Pavel A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) :455-477
[4]  
Bradley P., 2017, bioRxiv, P234955, DOI DOI 10.1101/234955
[5]   Measurement of bacterial replication rates in microbial communities [J].
Brown, Christopher T. ;
Olm, Matthew R. ;
Thomas, Brian C. ;
Banfield, Jillian F. .
NATURE BIOTECHNOLOGY, 2016, 34 (12) :1256-1263
[6]  
Bushnell B., 2016, BBMap Short Read Aligner
[7]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10
[8]   VFDB 2016: hierarchical and refined dataset for big data analysis-10 years on [J].
Chen, Lihong ;
Zheng, Dandan ;
Liu, Bo ;
Yang, Jian ;
Jin, Qi .
NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) :D694-D697
[9]   Nextflow enables reproducible computational workflows [J].
Di Tommaso, Paolo ;
Chatzou, Maria ;
Floden, Evan W. ;
Prieto Barja, Pablo ;
Palumbo, Emilio ;
Notredame, Cedric .
NATURE BIOTECHNOLOGY, 2017, 35 (04) :316-319
[10]   ClonalFrameML: Efficient Inference of Recombination in Whole Bacterial Genomes [J].
Didelot, Xavier ;
Wilson, Daniel J. .
PLOS COMPUTATIONAL BIOLOGY, 2015, 11 (02)