Harnessing virtual machines to simplify next-generation DNA sequencing analysis

被引:19
作者
Nocq, Julie [1 ,2 ]
Celton, Magalie [1 ,2 ,3 ]
Gendron, Patrick [1 ]
Lemieux, Sebastien [1 ,4 ]
Wilhelm, Brian T. [1 ,2 ]
机构
[1] Univ Montreal, Inst Res Immunol & Canc, Montreal, PQ H3T 1J4, Canada
[2] Univ Montreal, Dept Med, Lab High Throughput Genom, Montreal, PQ H3T 1J4, Canada
[3] INRA, UMR1083, F-34060 Montpellier, France
[4] Univ Montreal, Lab Funct & Struct Bioinformat Comp Sci & Operat, Montreal, PQ H3T 1J4, Canada
关键词
RNA-SEQ; DIFFERENTIAL EXPRESSION; ALIGNMENT; TOOL; ANNOTATION; FRAMEWORK; PACKAGE; TOPHAT; GENE;
D O I
10.1093/bioinformatics/btt352
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The growth of next-generation sequencing (NGS) has not only dramatically accelerated the pace of research in the field of genomics, but it has also opened the door to personalized medicine and diagnostics. The resulting flood of data has led to the rapid development of large numbers of bioinformatic tools for data analysis, creating a challenging situation for researchers when choosing and configuring a variety of software for their analysis, and for other researchers trying to replicate their analysis. As NGS technology continues to expand from the research environment into clinical laboratories, the challenges associated with data analysis have the potential to slow the adoption of this technology. Results: Here we discuss the potential of virtual machines (VMs) to be used as a method for sharing entire installations of NGS software (bioinformatic 'pipelines'). VMs are created by programs designed to allow multiple operating systems to co-exist on a single physical machine, and they can be made following the object-oriented paradigm of encapsulating data and methods together. This allows NGS data to be distributed within a VM, along with the pre-configured software for its analysis. Although VMs have historically suffered from poor performance relative to native operating systems, we present benchmarking results demonstrating that this reduced performance can now be minimized. We further discuss the many potential benefits of VMs as a solution for NGS analysis and describe several published examples. Lastly, we consider the benefits of VMs in facilitating the introduction of NGS technology into the clinical environment.
引用
收藏
页码:2075 / 2083
页数:9
相关论文
共 56 条
[1]   CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing [J].
Abyzov, Alexej ;
Urban, Alexander E. ;
Snyder, Michael ;
Gerstein, Mark .
GENOME RESEARCH, 2011, 21 (06) :974-984
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing [J].
Angiuoli, Samuel V. ;
Matalka, Malcolm ;
Gussman, Aaron ;
Galens, Kevin ;
Vangala, Mahesh ;
Riley, David R. ;
Arze, Cesar ;
White, James R. ;
White, Owen ;
Fricke, W. Florian .
BMC BIOINFORMATICS, 2011, 12
[4]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[5]   High-resolution mapping of copy-number alterations with massively parallel sequencing [J].
Chiang, Derek Y. ;
Getz, Gad ;
Jaffe, David B. ;
O'Kelly, Michael J. T. ;
Zhao, Xiaojun ;
Carter, Scott L. ;
Russ, Carsten ;
Nusbaum, Chad ;
Meyerson, Matthew ;
Lander, Eric S. .
NATURE METHODS, 2009, 6 (01) :99-103
[6]  
Coker Russell., 2001, Bonnie++
[7]   THE ORIGIN OF THE VM-370 TIME-SHARING SYSTEM [J].
CREASY, RJ .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1981, 25 (05) :483-490
[8]  
Cret O, 2009, ROM J INF SCI TECH, V12, P51
[9]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[10]   FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology [J].
Fejes, Anthony P. ;
Robertson, Gordon ;
Bilenky, Mikhail ;
Varhol, Richard ;
Bainbridge, Matthew ;
Jones, Steven J. M. .
BIOINFORMATICS, 2008, 24 (15) :1729-1730