Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community

被引:100
作者
Krampis, Konstantinos [1 ]
Booth, Tim [2 ]
Chapman, Brad [3 ]
Tiwari, Bela [4 ]
Bicak, Mesude [2 ]
Field, Dawn [2 ]
Nelson, Karen E. [1 ]
机构
[1] J Craig Venter Inst, Rockville, MD 20850 USA
[2] CEH Wallingford, Wallingford, Oxon, England
[3] Harvard Univ, Sch Publ Hlth, Bioinformat Core, Boston, MA 02115 USA
[4] CLC Bio, DK-8200 Aarhus N, Denmark
关键词
Bioinformatics - Investments - Network security - Virtual machine - Application programs - Biochemistry;
D O I
10.1186/1471-2105-13-42
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. Results: Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. Conclusions: Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them.
引用
收藏
页数:8
相关论文
共 11 条
[1]   Galaxy CloudMan: delivering cloud compute clusters [J].
Afgan, Enis ;
Baker, Dannon ;
Coraor, Nate ;
Chapman, Brad ;
Nekrutenko, Anton ;
Taylor, James .
BMC BIOINFORMATICS, 2010, 11
[2]   CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing [J].
Angiuoli, Samuel V. ;
Matalka, Malcolm ;
Gussman, Aaron ;
Galens, Kevin ;
Vangala, Mahesh ;
Riley, David R. ;
Arze, Cesar ;
White, James R. ;
White, Owen ;
Fricke, W. Florian .
BMC BIOINFORMATICS, 2011, 12
[3]  
[Anonymous], P 9 IEEE ACM INT S C
[4]   In silico research in the era of cloud computing [J].
Dudley, Joel T. ;
Butte, Atul J. .
NATURE BIOTECHNOLOGY, 2010, 28 (11) :1181-1185
[5]   Open software for biologists: from famine to feast [J].
Field, Dawn ;
Tiwari, Bela ;
Booth, Tim ;
Houten, Stewart ;
Swan, Dan ;
Bertrand, Nicolas ;
Thurston, Milo .
NATURE BIOTECHNOLOGY, 2006, 24 (07) :801-803
[6]   Searching for SNPs with cloud computing [J].
Langmead, Ben ;
Schatz, Michael C. ;
Lin, Jimmy ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (11)
[7]   Community-driven computational biology with Debian Linux [J].
Moeller, Steffen ;
Krabbenhoeft, Hajo Nils ;
Tille, Andreas ;
Paleino, David ;
Williams, Alan ;
Wolstencroft, Katy ;
Goble, Carole ;
Holland, Richard ;
Belhachemi, Dominique ;
Plessy, Charles .
BMC BIOINFORMATICS, 2010, 11
[8]   Up in a cloud? [J].
Sansom, Clare .
NATURE BIOTECHNOLOGY, 2010, 28 (01) :13-15
[9]   Cloud computing and the DNA data race [J].
Schatz, Michael C. ;
Langmead, Ben ;
Salzberg, Steven L. .
NATURE BIOTECHNOLOGY, 2010, 28 (07) :691-693
[10]   CloudBurst: highly sensitive read mapping with MapReduce [J].
Schatz, Michael C. .
BIOINFORMATICS, 2009, 25 (11) :1363-1369