MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants

被引:10
作者
Elshazly, Hatem [1 ]
Souilmi, Yassine [3 ,4 ]
Tonellato, Peter J. [4 ,5 ]
Wall, Dennis P. [6 ,7 ]
Abouelhoda, Mohamed [1 ,2 ]
机构
[1] Nile Univ, Ctr Informat Sci, Juhayna Sq, Giza, Egypt
[2] Cairo Univ, Fac Engn, Syst & Biomed Engn Dept, Giza, Egypt
[3] Mohamed Vth Univ Rabat, Dept Biol, 4 Ibn Battouta Ave,BP 1014RP, Rabat, Morocco
[4] Harvard Med Sch, Dept Biomed Informat, 10 Shattuck St, Boston, MA 02115 USA
[5] Harvard Med Sch, Brigham & Womens Hosp, Dept Pathol, Boston, MA 02215 USA
[6] Stanford Univ, Dept Pediat & Psychiat, Div Syst Med, Stanford, CA 94305 USA
[7] Stanford Univ, Program Biomed Informat, Stanford, CA 94305 USA
来源
BMC BIOINFORMATICS | 2017年 / 18卷
基金
美国国家卫生研究院;
关键词
Variant analysis; Cloud computing; Multicloud; Sequence analysis; Personalized medicine; SEQUENCING DATA; CLOUD; FRAMEWORK; WORKFLOWS;
D O I
10.1186/s12859-016-1454-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Next Generation Genome sequencing techniques became affordable for massive sequencing efforts devoted to clinical characterization of human diseases. However, the cost of providing cloud-based data analysis of the mounting datasets remains a concerning bottleneck for providing cost-effective clinical services. To address this computational problem, it is important to optimize the variant analysis workflow and the used analysis tools to reduce the overall computational processing time, and concomitantly reduce the processing cost. Furthermore, it is important to capitalize on the use of the recent development in the cloud computing market, which have witnessed more providers competing in terms of products and prices. Results: In this paper, we present a new package called MC-GenomeKey (Multi-Cloud GenomeKey) that efficiently executes the variant analysis workflow for detecting and annotating mutations using cloud resources from different commercial cloud providers. Our package supports Amazon, Google, and Azure clouds, as well as, any other cloud platform based on OpenStack. Our package allows different scenarios of execution with different levels of sophistication, up to the one where a workflow can be executed using a cluster whose nodes come from different clouds. MC-GenomeKey also supports scenarios to exploit the spot instance model of Amazon in combination with the use of other cloud platforms to provide significant cost reduction. To the best of our knowledge, this is the first solution that optimizes the execution of the workflow using computational resources from different cloud providers. Conclusions: MC-GenomeKey provides an efficient multicloud based solution to detect and annotate mutations. The package can run in different commercial cloud platforms, which enables the user to seize the best offers. The package also provides a reliable means to make use of the low-cost spot instance model of Amazon, as it provides an efficient solution to the sudden termination of spot machines as a result of a sudden price increase. The package has a web-interface and it is available for free for academic use.
引用
收藏
页数:14
相关论文
共 23 条
  • [1] Abouelhoda M, 2010, P 1 INT WORKSH WORKF, P1
  • [2] Towards Scalable and Cost-aware Bioinformatics Workflow Execution in the Cloud -Recent Advances to the Tavaxy Workflow System
    Abouelhoda, Mohamed
    Issa, Shady
    Ghanem, Moustafa
    [J]. FUNDAMENTA INFORMATICAE, 2013, 128 (03) : 255 - 280
  • [3] Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support
    Abouelhoda, Mohamed
    Issa, Shadi Alaa
    Ghanem, Moustafa
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [4] Ali A., 2015, LNCS, V9043
  • [5] WEP: a high-performance analysis pipeline for whole-exome data
    D'Antonio, Mattia
    De Meo, Paolo D'Onorio
    Paoletti, Daniele
    Elmi, Berardino
    Pallocca, Matteo
    Sanna, Nico
    Picardi, Ernesto
    Pesole, Graziano
    Castrignano, Tiziana
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [6] Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package
    El-Kalioby, Mohamed
    Abouelhoda, Mohamed
    Krueger, Jan
    Giegerich, Robert
    Sczyrba, Alexander
    Wall, Dennis P.
    Tonellato, Peter
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [7] Atlas2 Cloud: a framework for personal genome analysis in the cloud
    Evani, Uday S.
    Challis, Danny
    Yu, Jin
    Jackson, Andrew R.
    Paithankar, Sameer
    Bainbridge, Matthew N.
    Jakkamsetti, Adinarayana
    Peter Pham
    Coarfa, Cristian
    Milosavljevic, Aleksandar
    Yu, Fuli
    [J]. BMC GENOMICS, 2012, 13
  • [8] SIMPLEX: Cloud-Enabled Pipeline for the Comprehensive Analysis of Exome Sequencing Data
    Fischer, Maria
    Snajder, Rene
    Pabinger, Stephan
    Dander, Andreas
    Schossig, Anna
    Zschocke, Johannes
    Trajanoski, Zlatko
    Stocker, Gernot
    [J]. PLOS ONE, 2012, 7 (08):
  • [9] Biomedical Cloud Computing With Amazon Web Services
    Fusaro, Vincent A.
    Patil, Prasad
    Gafni, Erik
    Wall, Dennis P.
    Tonellato, Peter J.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (08)
  • [10] COSMOS: Python']Python library for massively parallel workflows
    Gafni, Erik
    Luquette, Lovelace J.
    Lancaster, Alex K.
    Hawkins, Jared B.
    Jung, Jae-Yoon
    Souilmi, Yassine
    Wall, Dennis P.
    Tonellato, Peter J.
    [J]. BIOINFORMATICS, 2014, 30 (20) : 2956 - 2958