MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants

被引:10
作者
Elshazly, Hatem [1 ]
Souilmi, Yassine [3 ,4 ]
Tonellato, Peter J. [4 ,5 ]
Wall, Dennis P. [6 ,7 ]
Abouelhoda, Mohamed [1 ,2 ]
机构
[1] Nile Univ, Ctr Informat Sci, Juhayna Sq, Giza, Egypt
[2] Cairo Univ, Fac Engn, Syst & Biomed Engn Dept, Giza, Egypt
[3] Mohamed Vth Univ Rabat, Dept Biol, 4 Ibn Battouta Ave,BP 1014RP, Rabat, Morocco
[4] Harvard Med Sch, Dept Biomed Informat, 10 Shattuck St, Boston, MA 02115 USA
[5] Harvard Med Sch, Brigham & Womens Hosp, Dept Pathol, Boston, MA 02215 USA
[6] Stanford Univ, Dept Pediat & Psychiat, Div Syst Med, Stanford, CA 94305 USA
[7] Stanford Univ, Program Biomed Informat, Stanford, CA 94305 USA
来源
BMC BIOINFORMATICS | 2017年 / 18卷
基金
美国国家卫生研究院;
关键词
Variant analysis; Cloud computing; Multicloud; Sequence analysis; Personalized medicine; SEQUENCING DATA; CLOUD; FRAMEWORK; WORKFLOWS;
D O I
10.1186/s12859-016-1454-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Next Generation Genome sequencing techniques became affordable for massive sequencing efforts devoted to clinical characterization of human diseases. However, the cost of providing cloud-based data analysis of the mounting datasets remains a concerning bottleneck for providing cost-effective clinical services. To address this computational problem, it is important to optimize the variant analysis workflow and the used analysis tools to reduce the overall computational processing time, and concomitantly reduce the processing cost. Furthermore, it is important to capitalize on the use of the recent development in the cloud computing market, which have witnessed more providers competing in terms of products and prices. Results: In this paper, we present a new package called MC-GenomeKey (Multi-Cloud GenomeKey) that efficiently executes the variant analysis workflow for detecting and annotating mutations using cloud resources from different commercial cloud providers. Our package supports Amazon, Google, and Azure clouds, as well as, any other cloud platform based on OpenStack. Our package allows different scenarios of execution with different levels of sophistication, up to the one where a workflow can be executed using a cluster whose nodes come from different clouds. MC-GenomeKey also supports scenarios to exploit the spot instance model of Amazon in combination with the use of other cloud platforms to provide significant cost reduction. To the best of our knowledge, this is the first solution that optimizes the execution of the workflow using computational resources from different cloud providers. Conclusions: MC-GenomeKey provides an efficient multicloud based solution to detect and annotate mutations. The package can run in different commercial cloud platforms, which enables the user to seize the best offers. The package also provides a reliable means to make use of the low-cost spot instance model of Amazon, as it provides an efficient solution to the sudden termination of spot machines as a result of a sudden price increase. The package has a web-interface and it is available for free for academic use.
引用
收藏
页数:14
相关论文
共 23 条
  • [11] Streaming Support for Data Intensive Cloud-Based Sequence Analysis
    Issa, Shadi A.
    Kienzler, Romeo
    El-Kalioby, Mohamed
    Tonellato, Peter J.
    Wall, Dennis
    Bruggmann, Remy
    Abouelhoda, Mohamed
    [J]. BIOMED RESEARCH INTERNATIONAL, 2013, 2013
  • [12] STORMSeq: An Open-Source, User-Friendly Pipeline for Processing Personal Genomics Data in the Cloud
    Karczewski, Konrad J.
    Fernald, Guy Haskin
    Martin, Alicia R.
    Snyder, Michael
    Tatonetti, Nicholas P.
    Dudley, Joel T.
    [J]. PLOS ONE, 2014, 9 (01):
  • [13] Fast and accurate long-read alignment with Burrows-Wheeler transform
    Li, Heng
    Durbin, Richard
    [J]. BIOINFORMATICS, 2010, 26 (05) : 589 - 595
  • [14] The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
    McKenna, Aaron
    Hanna, Matthew
    Banks, Eric
    Sivachenko, Andrey
    Cibulskis, Kristian
    Kernytsky, Andrew
    Garimella, Kiran
    Altshuler, David
    Gabriel, Stacey
    Daly, Mark
    DePristo, Mark A.
    [J]. GENOME RESEARCH, 2010, 20 (09) : 1297 - 1303
  • [15] Will Computers Crash Genomics?
    Pennisi, Elizabeth
    [J]. SCIENCE, 2011, 331 (6018) : 666 - 668
  • [16] GAMES identifies and annotates mutations in next-generation sequencing projects
    Sana, Maria Elena
    Iascone, Maria
    Marchetti, Daniela
    Palatini, Jeff
    Galasso, Marco
    Volinia, Stefano
    [J]. BIOINFORMATICS, 2011, 27 (01) : 9 - 13
  • [17] Cloud computing and the DNA data race
    Schatz, Michael C.
    Langmead, Ben
    Salzberg, Steven L.
    [J]. NATURE BIOTECHNOLOGY, 2010, 28 (07) : 691 - 693
  • [18] Scalable and cost-effective NGS genotyping in the cloud
    Souilmi, Yassine
    Lancaster, Alex K.
    Jung, Jae-Yoon
    Rizzo, Ettore
    Hawkins, Jared B.
    Powles, Ryan
    Amzazi, Saaid
    Ghazal, Hassan
    Tonellato, Peter J.
    Wall, Dennis P.
    [J]. BMC MEDICAL GENOMICS, 2015, 8
  • [19] Souilmi Y, 2015, BMC BIOINFORMATICS, V16
  • [20] The case for cloud computing in genome informatics
    Stein, Lincoln D.
    [J]. GENOME BIOLOGY, 2010, 11 (05):