Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud

被引:83
作者
Afgan, Enis [1 ,2 ,3 ]
Sloggett, Clare [1 ]
Goonasekera, Nuwan [1 ]
Makunin, Igor [4 ]
Benson, Derek [4 ]
Crowe, Mark [5 ]
Gladman, Simon [1 ]
Kowsar, Yousef [1 ]
Pheasant, Michael [4 ]
Horst, Ron [4 ]
Lonie, Andrew [1 ]
机构
[1] Univ Melbourne, Victorian Life Sci Computat Initiat VLSCI, Melbourne, Vic, Australia
[2] Johns Hopkins Univ, Dept Biol, Baltimore, MD 21218 USA
[3] Rudjer Boskovic Inst, Ctr Comp & Informat CIR, Zagreb, Croatia
[4] Univ Queensland, Ctr Res Comp, Brisbane, Qld, Australia
[5] Univ Queensland, Queensland Facil Adv Bioinformat QFAB, Brisbane, Qld, Australia
关键词
DIFFERENTIAL EXPRESSION ANALYSIS; COMPUTATIONAL SOLUTIONS; DATA-MANAGEMENT; GALAXY; GENE; SOFTWARE; PLATFORM; BROWSER; BIOCONDUCTOR; SYSTEM;
D O I
10.1371/journal.pone.0140829
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. Results We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. Conclusions This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.
引用
收藏
页数:20
相关论文
共 49 条
[1]  
Afgan E, INT CONV INF COMM TE, P6
[2]  
Afgan E, 2015, CONCURRENCY IN PRESS, V16
[3]   CloudMan as a platform for tool, data, and analysis distribution [J].
Afgan, Enis ;
Chapman, Brad ;
Taylor, James .
BMC BIOINFORMATICS, 2012, 13
[4]   A reference model for deploying applications in virtualized environments [J].
Afgan, Enis ;
Baker, Dannon ;
Nekrutenko, Anton ;
Taylor, James .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2012, 24 (12) :1349-1361
[5]   Harnessing cloud computing with Galaxy Cloud [J].
Afgan, Enis ;
Baker, Dannon ;
Coraor, Nate ;
Goto, Hiroki ;
Paul, Ian M. ;
Makova, Kateryna D. ;
Nekrutenko, Anton ;
Taylor, James .
NATURE BIOTECHNOLOGY, 2011, 29 (11) :972-974
[6]   Galaxy CloudMan: delivering cloud compute clusters [J].
Afgan, Enis ;
Baker, Dannon ;
Coraor, Nate ;
Chapman, Brad ;
Nekrutenko, Anton ;
Taylor, James .
BMC BIOINFORMATICS, 2010, 11
[7]   Issues in biomedical research data management and analysis: Needs and barriers [J].
Anderson, Nicholas R. ;
Lee, Sally ;
Brockenbrough, J. Scott ;
Minie, Mark E. ;
Fuller, Sherrilynne ;
Brinkley, James ;
Tarczy-Hornoch, Peter .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2007, 14 (04) :478-488
[8]  
[Anonymous], 2014, TECHNOLOGY INNOVATIO
[9]   Computational solutions for omics data [J].
Berger, Bonnie ;
Peng, Jian ;
Singh, Mona .
NATURE REVIEWS GENETICS, 2013, 14 (05) :333-346
[10]   Wrangling Galaxy's reference data [J].
Blankenberg, Daniel ;
Johnson, James E. ;
Taylor, James ;
Nekrutenko, Anton .
BIOINFORMATICS, 2014, 30 (13) :1917-1919