Serverless computing in omics data analysis and integration

被引:19
作者
Grzesik, Piotr [1 ]
Augustyn, Dariusz R. [1 ]
Wycislik, Lukasz [1 ]
Mrozek, Dariusz [2 ,3 ]
机构
[1] Silesian Tech Univ, Dept Appl Informat, PL-44100 Gliwice, Poland
[2] Silesian Tech Univ, Dept Appl Informat, Fac Automat Control Elect & Comp Sci, Gliwice, Poland
[3] Silesian Tech Univ, Cooperat & Dev, Fac Automat Control Elect & Comp Sci, Gliwice, Poland
关键词
cloud computing; serverless computing; omics data processing; omics data integration; function-as-a-service; container-as-a-service; bioinformatics; CLOUD; ALGORITHM; ALIGNMENT; WORKFLOWS; TAVERNA; TOOL;
D O I
10.1093/bib/bbab349
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A comprehensive analysis of omics data can require vast computational resources and access to varied data sources that must be integrated into complex, multi-step analysis pipelines. Execution of many such analyses can be accelerated by applying the cloud computing paradigm, which provides scalable resources for storing data of different types and parallelizing data analysis computations. Moreover, these resources can be reused for different multi-omics analysis scenarios. Traditionally, developers are required to manage a cloud platform's underlying infrastructure, configuration, maintenance and capacity planning. The serverless computing paradigm simplifies these operations by automatically allocating and maintaining both servers and virtual machines, as required for analysis tasks. This paradigm offers highly parallel execution and high scalability without manual management of the underlying infrastructure, freeing developers to focus on operational logic. This paper reviews serverless solutions in bioinformatics and evaluates their usage in omics data analysis and integration. We start by reviewing the application of the cloud computing model to a multi-omics data analysis and exposing some shortcomings of the early approaches. We then introduce the serverless computing paradigm and show its applicability for performing an integrative analysis of multiple omics data sources in the context of the COVID-19 pandemic.
引用
收藏
页数:11
相关论文
共 52 条
[1]  
Aboukhalil R, SERVERLESS GENOMICS
[2]   CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing [J].
Angiuoli, Samuel V. ;
Matalka, Malcolm ;
Gussman, Aaron ;
Galens, Kevin ;
Vangala, Mahesh ;
Riley, David R. ;
Arze, Cesar ;
White, James R. ;
White, Owen ;
Fricke, W. Florian .
BMC BIOINFORMATICS, 2011, 12
[3]  
[Anonymous], ANN AM EL COMP CLOUD
[4]   Elastic Scheduling of Scientific Workflows under Deadline Constraints in Cloud Computing Environments [J].
Anwar, Nazia ;
Deng, Huifang .
FUTURE INTERNET, 2018, 10 (01)
[5]   Perspectives of using Cloud computing in integrative analysis of multi-omics data [J].
Augustyn, Dariusz R. ;
Wycislik, Lukasz ;
Mrozek, Dariusz .
BRIEFINGS IN FUNCTIONAL GENOMICS, 2021, 20 (04) :198-206
[6]  
Ayres DL, 2012, SYST BIOL, V61, P170, DOI [10.1093/sysbio/syr100, 10.1093/sysbio/sys029]
[7]  
Baele G, 2019, METHODS MOL BIOL, V1910, P691, DOI 10.1007/978-1-4939-9074-0_23
[8]  
Baker D., 2020, GLOBAL OMICS DATA SH
[9]   ClickGene: an open cloud-based platform for big pan-cancer data genome-wide association study, visualization and exploration [J].
Bi, Jia-Hao ;
Tong, Yi-Fan ;
Qiu, Zhe-Wei ;
Yang, Xing-Feng ;
Minna, John ;
Gazdar, Adi F. ;
Song, Kai .
BIODATA MINING, 2019, 12 (1)
[10]  
Birger C, 2017, 209494 BIORXIV, P1