KNIME4NGS: a comprehensive toolbox for next generation sequencing analysis

被引:4
作者
Hastreiter, Maximilian [1 ]
Jeske, Tim [1 ]
Hoser, Jonathan [1 ]
Kluge, Michael [1 ]
Ahomaa, Kaarin [1 ]
Friedl, Marie-Sophie [1 ]
Kopetzky, Sebastian J. [1 ]
Quell, Jan-Dominik [1 ]
Mewes, H. -Werner [1 ]
Kueffner, Robert [1 ]
机构
[1] Helmholtz Zentrum Munchen, Inst Bioinformat & Syst Biol, D-85764 Neuherberg, Germany
关键词
D O I
10.1093/bioinformatics/btx003
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A Summary: Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules' executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME.
引用
收藏
页码:1565 / 1567
页数:3
相关论文
共 8 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]   openBIS: a flexible framework for managing and analyzing complex data in biology research [J].
Bauch, Angela ;
Adamczyk, Izabela ;
Buczek, Piotr ;
Elmer, Franz-Josef ;
Enimanev, Kaloyan ;
Glyzewski, Pawel ;
Kohler, Manuel ;
Pylak, Tomasz ;
Quandt, Andreas ;
Ramakrishnan, Chandrasekhar ;
Beisel, Christian ;
Malmstroem, Lars ;
Aebersold, Ruedi ;
Rinn, Bernd .
BMC BIOINFORMATICS, 2011, 12
[3]   KNIME:: The Konstanz Information Miner [J].
Berthold, Michael R. ;
Cebron, Nicolas ;
Dill, Fabian ;
Gabriel, Thomas R. ;
Koetter, Tobias ;
Meinl, Thorsten ;
Ohl, Peter ;
Sieb, Christoph ;
Thiel, Kilian ;
Wiswedel, Bernd .
DATA ANALYSIS, MACHINE LEARNING AND APPLICATIONS, 2008, :319-326
[4]   SeqAn An efficient, generic C++ library for sequence analysis [J].
Doering, Andreas ;
Weese, David ;
Rausch, Tobias ;
Reinert, Knut .
BMC BIOINFORMATICS, 2008, 9 (1)
[5]   Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences [J].
Goecks, Jeremy ;
Nekrutenko, Anton ;
Taylor, James .
GENOME BIOLOGY, 2010, 11 (08)
[6]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[7]   Knime4Bio: a set of custom nodes for the interpretation of next-generation sequencing data with KNIME† [J].
Lindenbaum, Pierre ;
Le Scouarnec, Solena ;
Portero, Vincent ;
Redon, Richard .
BIOINFORMATICS, 2011, 27 (22) :3200-3201
[8]   The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data [J].
McKenna, Aaron ;
Hanna, Matthew ;
Banks, Eric ;
Sivachenko, Andrey ;
Cibulskis, Kristian ;
Kernytsky, Andrew ;
Garimella, Kiran ;
Altshuler, David ;
Gabriel, Stacey ;
Daly, Mark ;
DePristo, Mark A. .
GENOME RESEARCH, 2010, 20 (09) :1297-1303