Next-generation sequencing data analysis on cloud computing

被引:13
作者
Kwon, Taesoo [1 ,2 ]
Yoo, Won Gi [2 ]
Lee, Won-Ja [3 ]
Kim, Won [1 ]
Kim, Dae-Won [2 ]
机构
[1] Seoul Natl Univ, Sch Biol Sci, Seoul 151742, South Korea
[2] Korea Natl Inst Hlth, Korea Ctr Dis Control & Prevent, Div Biosafety Evaluat & Control, Chungbuk 363951, South Korea
[3] Korea Natl Inst Hlth, Div Arboviruses, Korea Ctr Dis Control & Prevent, Chungbuk 363951, South Korea
关键词
Next-generation sequencing; Cloud computing; Virtualization; Mapreduce; High performance computing; CHIP-SEQ DATA; RNA-SEQ; BIOINFORMATICS; FRAMEWORK; TOOL;
D O I
10.1007/s13258-015-0280-7
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
With the advent of next-generation sequencing (NGS), including whole genome sequencing (WGS), RNA sequencing (RNA-seq), and chromatin immunoprecipitation followed by sequencing (ChIP-seq), many biologists and computer scientists are highlighting the urgent need for computing power, storage, and various bioinformatics software for analyzing large quantities of sequence data. Currently, building the computational infrastructure required for massive data processing and providing maintenance services are among the most important tasks. However, technology platforms for handling large amounts of information pose multiple challenges for data access and processing. To overcome these challenges, cloud computing technologies are emerging as a possible infrastructure for tackling the intensive use of computing power and communication resources in NGS data analysis. Thus, in this review, we explain the concepts and key technologies of cloud computing, such as Map and Reduce, and discuss the problem of data transfer. To reveal the performance and usefulness of these technologies, we analyzed NGS data using cloud platforms and compared them with a local cluster. From the benchmark results, we concluded that cloud computing is still more expensive than local cluster, but provides reasonable performance for NGS data analysis with acceptable prices and could be a good alternative to local cluster systems.
引用
收藏
页码:489 / 501
页数:13
相关论文
共 35 条
[1]   Galaxy CloudMan: delivering cloud compute clusters [J].
Afgan, Enis ;
Baker, Dannon ;
Coraor, Nate ;
Chapman, Brad ;
Nekrutenko, Anton ;
Taylor, James .
BMC BIOINFORMATICS, 2010, 11
[2]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]   CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing [J].
Angiuoli, Samuel V. ;
Matalka, Malcolm ;
Gussman, Aaron ;
Galens, Kevin ;
Vangala, Mahesh ;
Riley, David R. ;
Arze, Cesar ;
White, James R. ;
White, Owen ;
Fricke, W. Florian .
BMC BIOINFORMATICS, 2011, 12
[4]  
[Anonymous], 2004, P 6 C S OP SYST DES
[5]   TREAT: a bioinformatics tool for variant annotations and visualizations in targeted and exome sequencing data [J].
Asmann, Yan W. ;
Middha, Sumit ;
Hossain, Asif ;
Baheti, Saurabh ;
Li, Ying ;
Chai, High-Seng ;
Sun, Zhifu ;
Duffy, Patrick H. ;
Hadad, Ahmed A. ;
Nair, Asha ;
Liu, Xiaoyu ;
Zhang, Yuji ;
Klee, Eric W. ;
Kalari, Krishna R. ;
Kocher, Jean-Pierre A. .
BIOINFORMATICS, 2012, 28 (02) :277-278
[6]   Next-generation sequencing: adjusting to data overload [J].
Baker, Monya .
NATURE METHODS, 2010, 7 (07) :495-499
[7]   The digital generation [J].
Blow, Nathan .
NATURE, 2009, 458 (7235) :239-244
[8]   Bioinformatics clouds for big data manipulation [J].
Dai, Lin ;
Gao, Xin ;
Guo, Yan ;
Xiao, Jingfa ;
Zhang, Zhang .
BIOLOGY DIRECT, 2012, 7
[9]   Translational bioinformatics in the cloud: an affordable alternative [J].
Dudley, Joel T. ;
Pouliot, Yannick ;
Chen, Rong ;
Morgan, Alexander A. ;
Butte, Atul J. .
GENOME MEDICINE, 2010, 2
[10]  
Fei Hu, 2011, Journal of Computing and Information Technology - CIT, V19, P25, DOI 10.2498/cit.1001864