snpTree - a web-server to identify and construct SNP trees from whole genome sequence data

被引:52
作者
Leekitcharoenphon, Pimlapas [1 ,2 ]
Kaas, Rolf S. [1 ,2 ]
Thomsen, Martin Christen Frolund [2 ]
Friis, Carsten [1 ]
Rasmussen, Simon [2 ]
Aarestrup, Frank M. [1 ]
机构
[1] Tech Univ Denmark, Nat Food Inst, DK-2800 Lyngby, Denmark
[2] Tech Univ Denmark, Ctr Biol Sequence Anal, Dept Syst Biol, DK-2800 Lyngby, Denmark
关键词
EPIDEMIOLOGY; ALGORITHMS; ALIGNMENT; EVOLUTION; FORMAT;
D O I
10.1186/1471-2164-13-S7-S6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data. Results: Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed from concatenated SNPs using FastTree and a perl script. The online server was implemented by HTML, Java and python script. The server was evaluated using four published bacterial WGS data sets (V. cholerae, S. aureus CC398, S. Typhimurium and M. tuberculosis). The evalution results for the first three cases was consistent and concordant for both raw reads and assembled genomes. In the latter case the original publication involved extensive filtering of SNPs, which could not be repeated using snpTree. Conclusions: The snpTree server is an easy to use option for rapid standardised and automatic SNP analysis in epidemiological studies also for users with limited bioinformatic experience. The web server is freely accessible at http://www.cbs.dtu.dk/services/snpTree-1.0/.
引用
收藏
页数:8
相关论文
共 19 条
[1]   The variant call format and VCFtools [J].
Danecek, Petr ;
Auton, Adam ;
Abecasis, Goncalo ;
Albers, Cornelis A. ;
Banks, Eric ;
DePristo, Mark A. ;
Handsaker, Robert E. ;
Lunter, Gerton ;
Marth, Gabor T. ;
Sherry, Stephen T. ;
McVean, Gilean ;
Durbin, Richard .
BIOINFORMATICS, 2011, 27 (15) :2156-2158
[2]   Fast algorithms for large-scale genome alignment and comparison [J].
Delcher, AL ;
Phillippy, A ;
Carlton, J ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 2002, 30 (11) :2478-2483
[3]   A pilot study of rapid benchtop sequencing of Staphylococcus aureus and Clostridium difficile for outbreak detection and surveillance [J].
Eyre, David W. ;
Golubchik, Tanya ;
Gordon, N. Claire ;
Bowden, Rory ;
Piazza, Paolo ;
Batty, Elizabeth M. ;
Ip, Camilla L. C. ;
Wilson, Daniel J. ;
Didelot, Xavier ;
O'Connor, Lily ;
Lay, Rochelle ;
Buck, David ;
Kearns, Angela M. ;
Shaw, Angela ;
Paul, John ;
Wilcox, Mark H. ;
Donnelly, Peter J. ;
Peto, Tim E. A. ;
Walker, A. Sarah ;
Crook, Derrick W. .
BMJ OPEN, 2012, 2 (03)
[4]  
Foxman Betsy, 2005, Epidemiol Perspect Innov, V2, P10, DOI 10.1186/1742-5573-2-10
[5]   Whole-Genome Sequencing and Social-Network Analysis of a Tuberculosis Outbreak [J].
Gardy, Jennifer L. ;
Johnston, James C. ;
Sui, Shannan J. Ho ;
Cook, Victoria J. ;
Shah, Lena ;
Brodkin, Elizabeth ;
Rempel, Shirley ;
Moore, Richard ;
Zhao, Yongjun ;
Holt, Robert ;
Varhol, Richard ;
Birol, Inanc ;
Lem, Marcus ;
Sharma, Meenu K. ;
Elwood, Kevin ;
Jones, Steven J. M. ;
Brinkman, Fiona S. L. ;
Brunham, Robert C. ;
Tang, Patrick .
NEW ENGLAND JOURNAL OF MEDICINE, 2011, 364 (08) :730-739
[6]   Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011 [J].
Grad, Yonatan H. ;
Lipsitch, Marc ;
Feldgarden, Michael ;
Arachchi, Harindra M. ;
Cerqueira, Gustavo C. ;
FitzGerald, Michael ;
Godfrey, Paul ;
Haas, Brian J. ;
Murphy, Cheryl I. ;
Russ, Carsten ;
Sykes, Sean ;
Walker, Bruce J. ;
Wortman, Jennifer R. ;
Young, Sarah ;
Zeng, Qiandong ;
Abouelleil, Amr ;
Bochicchio, James ;
Chauvin, Sara ;
DeSmet, Timothy ;
Gujja, Sharvari ;
McCowan, Caryn ;
Montmayeur, Anna ;
Steelman, Scott ;
Frimodt-Moller, Jakob ;
Petersen, Andreas M. ;
Struve, Carsten ;
Krogfelt, Karen A. ;
Bingen, Edouard ;
Weill, Francois-Xavier ;
Lander, Eric S. ;
Nusbaum, Chad ;
Birren, Bruce W. ;
Hung, Deborah T. ;
Hanage, William P. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (08) :3065-3070
[7]   Evolution of MRSA During Hospital Transmission and Intercontinental Spread [J].
Harris, Simon R. ;
Feil, Edward J. ;
Holden, Matthew T. G. ;
Quail, Michael A. ;
Nickerson, Emma K. ;
Chantratita, Narisara ;
Gardete, Susana ;
Tavares, Ana ;
Day, Nick ;
Lindsay, Jodi A. ;
Edgeworth, Jonathan D. ;
de Lencastre, Herminia ;
Parkhill, Julian ;
Peacock, Sharon J. ;
Bentley, Stephen D. .
SCIENCE, 2010, 327 (5964) :469-474
[8]  
Hendriksen RS, 2011, MBIO, V2, DOI [10.1128/mBio.00305-11, 10.1128/mBio.00157-11]
[9]   Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria [J].
Larsen, Mette V. ;
Cosentino, Salvatore ;
Rasmussen, Simon ;
Friis, Carsten ;
Hasman, Henrik ;
Marvig, Rasmus Lykke ;
Jelsbak, Lars ;
Sicheritz-Ponten, Thomas ;
Ussery, David W. ;
Aarestrup, Frank M. ;
Lund, Ole .
JOURNAL OF CLINICAL MICROBIOLOGY, 2012, 50 (04) :1355-1361
[10]   The Sequence Alignment/Map format and SAMtools [J].
Li, Heng ;
Handsaker, Bob ;
Wysoker, Alec ;
Fennell, Tim ;
Ruan, Jue ;
Homer, Nils ;
Marth, Gabor ;
Abecasis, Goncalo ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (16) :2078-2079