deBGA: read alignment with de Bruijn graph-based seed and extension

被引:49
作者
Liu, Bo [1 ]
Guo, Hongzhe [1 ]
Brudno, Michael [2 ,3 ,4 ]
Wang, Yadong [1 ]
机构
[1] Harbin Inst Technol, Ctr Bioinformat, Harbin 150001, Heilongjiang, Peoples R China
[2] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G4, Canada
[3] Hosp Sick Children, Genet & Genome Biol Program, Toronto, ON M5G 1L7, Canada
[4] Hosp Sick Children, Ctr Computat Med, Toronto, ON M5G 1L7, Canada
关键词
ACCURATE; SEQUENCE; GENOMES;
D O I
10.1093/bioinformatics/btw371
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: As high-throughput sequencing (HTS) technology becomes ubiquitous and the volume of data continues to rise, HTS read alignment is becoming increasingly rate-limiting, which keeps pressing the development of novel read alignment approaches. Moreover, promising novel applications of HTS technology require aligning reads to multiple genomes instead of a single reference; however, it is still not viable for the state-of-the-art aligners to align large numbers of reads tomultiple genomes. Results: We propose de Bruijn Graph-based Aligner (deBGA), an innovative graph-based seedand- extension algorithm to align HTS reads to a reference genome that is organized and indexed using a de Bruijn graph. With its well-handling of repeats, deBGA is substantially faster than stateof- the-art approaches while maintaining similar or higher sensitivity and accuracy. This makes it particularly well-suited to handle the rapidly growing volumes of sequencing data. Furthermore, it provides a promising solution for aligning reads to multiple genomes and graph-based references in HTS applications.
引用
收藏
页码:3224 / 3232
页数:9
相关论文
共 34 条
[1]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[2]  
Breitwieser Florian P, 2015, F1000Res, V4, P180, DOI 10.12688/f1000research.6743.1
[3]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[4]   Transforming clinical microbiology with bacterial genome sequencing [J].
Didelot, Xavier ;
Bowden, Rory ;
Wilson, Daniel J. ;
Peto, Tim E. A. ;
Crook, Derrick W. .
NATURE REVIEWS GENETICS, 2012, 13 (09) :601-612
[5]   Improved genome inference in the MHC using a population reference graph [J].
Dilthey, Alexander ;
Cox, Charles ;
Iqbal, Zamin ;
Nelson, Matthew R. ;
McVean, Gil .
NATURE GENETICS, 2015, 47 (06) :682-688
[6]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[7]   SeqAn An efficient, generic C++ library for sequence analysis [J].
Doering, Andreas ;
Weese, David ;
Rausch, Tobias ;
Reinert, Knut .
BMC BIOINFORMATICS, 2008, 9 (1)
[8]   Bacterial genome sequencing in the clinic: bioinformatic challenges and solutions [J].
Fricke, W. Florian ;
Rasko, David A. .
NATURE REVIEWS GENETICS, 2014, 15 (01) :49-55
[9]   Multiple reference genomes and transcriptomes for Arabidopsis thaliana [J].
Gan, Xiangchao ;
Stegle, Oliver ;
Behr, Jonas ;
Steffen, Joshua G. ;
Drewe, Philipp ;
Hildebrand, Katie L. ;
Lyngsoe, Rune ;
Schultheiss, Sebastian J. ;
Osborne, Edward J. ;
Sreedharan, Vipin T. ;
Kahles, Andre ;
Bohnert, Regina ;
Jean, Geraldine ;
Derwent, Paul ;
Kersey, Paul ;
Belfield, Eric J. ;
Harberd, Nicholas P. ;
Kemen, Eric ;
Toomajian, Christopher ;
Kover, Paula X. ;
Clark, Richard M. ;
Raetsch, Gunnar ;
Mott, Richard .
NATURE, 2011, 477 (7365) :419-423
[10]   High-quality draft assemblies of mammalian genomes from massively parallel sequence data [J].
Gnerre, Sante ;
MacCallum, Iain ;
Przybylski, Dariusz ;
Ribeiro, Filipe J. ;
Burton, Joshua N. ;
Walker, Bruce J. ;
Sharpe, Ted ;
Hall, Giles ;
Shea, Terrance P. ;
Sykes, Sean ;
Berlin, Aaron M. ;
Aird, Daniel ;
Costello, Maura ;
Daza, Riza ;
Williams, Louise ;
Nicol, Robert ;
Gnirke, Andreas ;
Nusbaum, Chad ;
Lander, Eric S. ;
Jaffe, David B. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (04) :1513-1518