Facing the Genome Data Deluge: Efficiently Identifying Genetic Variants with In-Memory Database Technology

被引:0
作者
Faehnrich, Cindy [1 ]
Schapranow, Matthieu-P. [1 ]
Plattner, Hasso [1 ]
机构
[1] Univ Potsdam, Hasso Plattner Inst, August Bebel Str 88, D-14482 Potsdam, Germany
来源
30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II | 2015年
关键词
Genome Data Analysis; Variant Calling; Single Nucleotide Polymorphism; In-Memory Database Technology;
D O I
10.1145/2695664.2695836
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Next-generation sequencing enables whole genome sequencing within a few hours at a minimum of cost. However, this technology imposes new challenges to computational genome analysis tasks in terms of efficiently processing an increasing amount of error-prone data. In this work, we focus on addressing these challenges for identifying Single Nucleotide Polymorphisms as one type of genetic variants in genome data. We propose the application of a column-store in-memory database for efficient data processing to profit from built-in compression and parallelization techniques and accessing data directly from main memory instead of slower disk space. We provide a statistical model that is sensitive to input data quality and utilizes knowledge from language population studies. Comparisons with state-of-the-art tools show that our approach outperforms traditional procedures on average by magnitudes of speed whilst requiring less administration efforts.
引用
收藏
页码:18 / 25
页数:8
相关论文
共 36 条
[1]  
1000 Genomes Project Consortium and others, 2012, NATURE, V491
[2]  
Amdahl G. M., 1967, SJCC
[3]  
[Anonymous], 2004, The Unified Modeling Language Reference Manual
[4]   Next-generation DNA sequencing techniques [J].
Ansorge, Wilhelm J. .
NEW BIOTECHNOLOGY, 2009, 25 (04) :195-203
[5]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[6]  
Broad Institute, 2014, GATK GUID QUEUE
[7]   Computational Techniques for Human Genome Resequencing Using Mated Gapped Reads [J].
Carnevali, Paolo ;
Baccash, Jonathan ;
Halpern, Aaron L. ;
Nazarenko, Igor ;
Nilsen, Geoffrey B. ;
Pant, Krishna P. ;
Ebert, Jessica C. ;
Brownley, Anushka ;
Morenzoni, Matt ;
Karpinchyk, Vitali ;
Martin, Bruce ;
Ballinger, Dennis G. ;
Drmanac, Radoje .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (03) :279-292
[8]   New goals for the US Human Genome Project: 1998-2003 [J].
Collins, FS ;
Patrinos, A ;
Jordan, E ;
Chakravarti, A ;
Gesteland, R ;
Walters, L ;
Fearon, E ;
Hartwelt, L ;
Langley, CH ;
Mathies, RA ;
Olson, M ;
Pawson, AJ ;
Pollard, T ;
Williamson, A ;
Wold, B ;
Buetow, K ;
Branscomb, E ;
Capecchi, M ;
Church, G ;
Garner, H ;
Gibbs, RA ;
Hawkins, T ;
Hodgson, K ;
Knotek, M ;
Meisler, M ;
Rubin, GM ;
Smith, LM ;
Smith, RF ;
Westerfield, M ;
Clayton, EW ;
Fisher, NL ;
Lerman, CE ;
McInerney, JD ;
Nebo, W ;
Press, N ;
Valle, D .
SCIENCE, 1998, 282 (5389) :682-689
[9]  
DePristo MA, 2011, NAT GENET, V43
[10]   Human Genome Sequencing Using Unchained Base Reads on Self-Assembling DNA Nanoarrays [J].
Drmanac, Radoje ;
Sparks, Andrew B. ;
Callow, Matthew J. ;
Halpern, Aaron L. ;
Burns, Norman L. ;
Kermani, Bahram G. ;
Carnevali, Paolo ;
Nazarenko, Igor ;
Nilsen, Geoffrey B. ;
Yeung, George ;
Dahl, Fredrik ;
Fernandez, Andres ;
Staker, Bryan ;
Pant, Krishna P. ;
Baccash, Jonathan ;
Borcherding, Adam P. ;
Brownley, Anushka ;
Cedeno, Ryan ;
Chen, Linsu ;
Chernikoff, Dan ;
Cheung, Alex ;
Chirita, Razvan ;
Curson, Benjamin ;
Ebert, Jessica C. ;
Hacker, Coleen R. ;
Hartlage, Robert ;
Hauser, Brian ;
Huang, Steve ;
Jiang, Yuan ;
Karpinchyk, Vitali ;
Koenig, Mark ;
Kong, Calvin ;
Landers, Tom ;
Le, Catherine ;
Liu, Jia ;
McBride, Celeste E. ;
Morenzoni, Matt ;
Morey, Robert E. ;
Mutch, Karl ;
Perazich, Helena ;
Perry, Kimberly ;
Peters, Brock A. ;
Peterson, Joe ;
Pethiyagoda, Charit L. ;
Pothuraju, Kaliprasad ;
Richter, Claudia ;
Rosenbaum, Abraham M. ;
Roy, Shaunak ;
Shafto, Jay ;
Sharanhovich, Uladzislau .
SCIENCE, 2010, 327 (5961) :78-81