snp-search: simple processing, manipulation and searching of SNPs from high-throughput sequencing

被引:6
作者
Al-Shahib, Ali [1 ]
Underwood, Anthony [1 ]
机构
[1] Publ Hlth England, Microbiol Serv, Appl Bioinformat & Lab Informat Unit, London NW9 5EQ, England
关键词
Single Nucleotide Polymorphisms (SNP); Variant Call Format (VCF); SQL database; High-throughput Sequencing; Next Generation Sequencing (NGS); Ruby; Phylogeny; MANAGEMENT;
D O I
10.1186/1471-2105-14-326
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: A typical bacterial pathogen genome mapping project can identify thousands of single nucleotide polymorphisms (SNP). Interpreting SNP data is complex and it is difficult to conceptualise the data contained within the large flat files that are the typical output from most SNP calling algorithms. One solution to this problem is to construct a database that can be queried using simple commands so that SNP interrogation and output is both easy and comprehensible. Results: Here we present snp-search, a tool that manages SNP data and allows for manipulation and searching of SNP data. After creation of a SNP database from a VCF file, snp-search can be used to convert the selected SNP data into FASTA sequences, construct phylogenies, look for unique SNPs, and output contextual information about each SNP. The FASTA output from snp-search is particularly useful for the generation of robust phylogenetic trees that are based on SNP differences across the conserved positions in whole genomes. Queries can be designed to answer critical genomic questions such as the association of SNPs with particular phenotypes. Conclusions: snp-search is a tool that manages SNP data and outputs useful information which can be used to test important biological hypotheses.
引用
收藏
页数:8
相关论文
共 15 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]   SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects [J].
Dereeper, Alexis ;
Nicolas, Stephane ;
Le Cunff, Loic ;
Bacilieri, Roberto ;
Doligez, Agnes ;
Peros, Jean-Pierre ;
Ruiz, Manuel ;
This, Patrice .
BMC BIOINFORMATICS, 2011, 12
[3]  
Hausman RE, 2004, CELL MOL APPROACH, P51
[4]  
Huson DH, 2012, SYSTEMATIC BIOL
[5]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[6]  
LIVINGSTONE CD, 1993, COMPUT APPL BIOSCI, V9, P745
[7]   PileLineGUI: a desktop environment for handling genome position files in next-generation sequencing studies [J].
Lopez-Fernandez, Hugo ;
Glez-Pena, Daniel ;
Reboiro-Jato, Miguel ;
Gomez-Lopez, Gonzalo ;
Pisano, David G. ;
Fdez-Riverola, Florentino .
NUCLEIC ACIDS RESEARCH, 2011, 39 :W562-W566
[8]   The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data [J].
McKenna, Aaron ;
Hanna, Matthew ;
Banks, Eric ;
Sivachenko, Andrey ;
Cibulskis, Kristian ;
Kernytsky, Andrew ;
Garimella, Kiran ;
Altshuler, David ;
Gabriel, Stacey ;
Daly, Mark ;
DePristo, Mark A. .
GENOME RESEARCH, 2010, 20 (09) :1297-1303
[9]   SNPpy - Database Management for SNP Data from Genome Wide Association Studies [J].
Mitha, Faheem ;
Herodotou, Herodotos ;
Borisov, Nedyalko ;
Jiang, Chen ;
Yoder, Josh ;
Owzar, Kouros .
PLOS ONE, 2011, 6 (10)
[10]   Genome-wide association studies pipeline (GWASpi): a desktop application for genome-wide SNP analysis and management [J].
Muniz-Fernandez, Fernando ;
Carreno-Torres, Angel ;
Morcillo-Suarez, Carlos ;
Navarro, Arcadi .
BIOINFORMATICS, 2011, 27 (13) :1871-1872