BLASTGrabber: a bioinformatic tool for visualization, analysis and sequence selection of massive BLAST data

被引:6
作者
Neumann, Ralf Stefan [1 ,2 ]
Kumar, Surendra [1 ,2 ]
Haverkamp, Thomas Hendricus Augustus [3 ]
Shalchian-Tabrizi, Kamran [1 ,2 ]
机构
[1] Univ Oslo, Sect Genet & Evolutionary Biol EVOGENE, Oslo, Norway
[2] Univ Oslo, CEDE, Oslo, Norway
[3] Univ Oslo, Dept Biosci, Ctr Ecol & Evolutionary Synth, Oslo, Norway
关键词
Analysis; BLAST; High-throughput; Taxonomy; Text-mining; Visualization; DISCOVERY; TAXONOMY; OUTPUT;
D O I
10.1186/1471-2105-15-128
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Advances in sequencing efficiency have vastly increased the sizes of biological sequence databases, including many thousands of genome-sequenced species. The BLAST algorithm remains the main search engine for retrieving sequence information, and must consequently handle data on an unprecedented scale. This has been possible due to high-performance computers and parallel processing. However, the raw BLAST output from contemporary searches involving thousands of queries becomes ill-suited for direct human processing. Few programs attempt to directly visualize and interpret BLAST output; those that do often provide a mere basic structuring of BLAST data. Results: Here we present a bioinformatics application named BLASTGrabber suitable for high-throughput sequencing analysis. BLASTGrabber, being implemented as a Java application, is OS-independent and includes a user friendly graphical user interface. Text or XML-formatted BLAST output files can be directly imported, displayed and categorized based on BLAST statistics. Query names and FASTA headers can be analysed by text-mining. In addition to visualizing sequence alignments, BLAST data can be ordered as an interactive taxonomy tree. All modes of analysis support selection, export and storage of data. A Java interface-based plugin structure facilitates the addition of customized third party functionality. Conclusion: The BLASTGrabber application introduces new ways of visualizing and analysing massive BLAST output data by integrating taxonomy identification, text mining capabilities and generic multi-dimensional rendering of BLAST hits. The program aims at a non-expert audience in terms of computer skills; the combination of new functionalities makes the program flexible and useful for a broad range of operations.
引用
收藏
页数:11
相关论文
共 33 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   Enhanced protein domain discovery using taxonomy [J].
Coin, L ;
Bateman, A ;
Durbin, R .
BMC BIOINFORMATICS, 2004, 5 (1)
[4]   Circoletto: visualizing sequence similarity with Circos [J].
Darzentas, Nikos .
BIOINFORMATICS, 2010, 26 (20) :2620-2621
[5]   Resolving the evolution of extant and extinct ruminants with high-throughput phylogenomics [J].
Decker, Jared E. ;
Pires, J. Chris ;
Conant, Gavin C. ;
McKay, Stephanie D. ;
Heaton, Michael P. ;
Chen, Kefei ;
Cooper, Alan ;
Vilkki, Johanna ;
Seabury, Christopher M. ;
Caetano, Alexandre R. ;
Johnson, Gary S. ;
Brenneman, Rick A. ;
Hanotte, Olivier ;
Eggert, Lori S. ;
Wiener, Pamela ;
Kim, Jong-Joo ;
Kim, Kwan Suk ;
Sonstegard, Tad S. ;
Van Tassell, Curt P. ;
Neibergs, Holly L. ;
McEwan, John C. ;
Brauning, Rudiger ;
Coutinho, Luiz L. ;
Babar, Masroor E. ;
Wilson, Gregory A. ;
McClure, Matthew C. ;
Rolf, Megan M. ;
Kim, JaeWoo ;
Schnabel, Robert D. ;
Taylor, Jeremy F. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (44) :18644-18649
[6]   BLAST-EXPLORER helps you building datasets for phylogenetic analysis [J].
Dereeper, Alexis ;
Audic, Stephane ;
Claverie, Jean-Michel ;
Blanc, Guillaume .
BMC EVOLUTIONARY BIOLOGY, 2010, 10
[7]  
Dong Q, 2005, PROTEOMICS PROTOCOLS, P555
[8]  
Fayyad U, 1996, AI MAG, V17, P37
[9]   MEGAN analysis of metagenomic data [J].
Huson, Daniel H. ;
Auch, Alexander F. ;
Qi, Ji ;
Schuster, Stephan C. .
GENOME RESEARCH, 2007, 17 (03) :377-386
[10]   Bioinformatics in the post-sequence era [J].
Kanehisa, M ;
Bork, P .
NATURE GENETICS, 2003, 33 (Suppl 3) :305-310