Wrangling Galaxy's reference data

被引:23
作者
Blankenberg, Daniel [1 ]
Johnson, James E. [2 ]
Taylor, James [3 ,4 ]
Nekrutenko, Anton [1 ]
机构
[1] Penn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
[2] Univ Minnesota, Minnesota Supercomp Inst, Minneapolis, MN 55455 USA
[3] Emory Univ, Dept Biol, Atlanta, GA 30322 USA
[4] Emory Univ, Dept Math & Comp Sci, Atlanta, GA 30322 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btu119
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The Galaxy platform has developed into a fully featured collaborative workbench, with goals of inherently capturing provenance to enable reproducible data analysis, and of making it straightforward to run one's own server. However, many Galaxy platform tools rely on the presence of reference data, such as alignment indexes, to function efficiently. Until now, the building of this cache of data for Galaxy has been an error-prone manual process lacking reproducibility and provenance. The Galaxy Data Manager framework is an enhancement that changes the management of Galaxy's built-in data cache from a manual procedure to an automated graphical user interface (GUI) driven process, which contains the same openness, reproducibility and provenance that is afforded to Galaxy's analysis tools. Data Manager tools allow the Galaxy administrator to download, create and install additional datasets for any type of reference data in real time. Availability and implementation: The Galaxy Data Manager framework is implemented in Python and has been integrated as part of the core Galaxy platform. Individual Data Manager tools can be defined locally or installed from a ToolShed, allowing the Galaxy community to define additional Data Manager tools as needed, with full versioning and dependency support.
引用
收藏
页码:1917 / 1919
页数:3
相关论文
共 4 条
  • [1] Blankenberg Daniel, 2010, Curr Protoc Mol Biol, VChapter 19, DOI 10.1002/0471142727.mb1910s89
  • [2] Galaxy: A platform for interactive large-scale genome analysis
    Giardine, B
    Riemer, C
    Hardison, RC
    Burhans, R
    Elnitski, L
    Shah, P
    Zhang, Y
    Blankenberg, D
    Albert, I
    Taylor, J
    Miller, W
    Kent, WJ
    Nekrutenko, A
    [J]. GENOME RESEARCH, 2005, 15 (10) : 1451 - 1455
  • [3] Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences
    Goecks, Jeremy
    Nekrutenko, Anton
    Taylor, James
    [J]. GENOME BIOLOGY, 2010, 11 (08):
  • [4] Fast and accurate short read alignment with Burrows-Wheeler transform
    Li, Heng
    Durbin, Richard
    [J]. BIOINFORMATICS, 2009, 25 (14) : 1754 - 1760