Paired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers

被引:30
作者
Medvedev, Paul [1 ]
Pham, Son [1 ]
Chaisson, Mark [3 ]
Tesler, Glenn [2 ]
Pevzner, Pavel [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA
[2] Univ Calif San Diego, Dept Math, San Diego, CA 92103 USA
[3] Pacific Biosci Calif, Menlo Pk, CA USA
关键词
de Bruijn graphs; fragment assembly; mate pairs; paired de Bruijn graphs; FRAGMENT; READS;
D O I
10.1089/cmb.2011.0151
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been mate pairs, which facilitate the assembly of repeating regions. Mate pairs have been algorithmically incorporated into most next generation assemblers as various heuristic post-processing steps to correct the assembly graph or to link contigs into scaffolds. Such methods have allowed the identification of longer contigs than would be possible with single reads; however, they can still fail to resolve complex repeats. Thus, improved methods for incorporating mate pairs will have a strong effect on contig length in the future. Here, we introduce the paired de Bruijn graph, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step. This graph has the potential to be used in place of the de Bruijn graph in any de Bruijn graph based assembler, maintaining all other assembly steps such as error-correction and repeat resolution. Through assembly results on simulated perfect data, we argue that this can effectively improve the contig sizes in assembly.
引用
收藏
页码:1625 / 1634
页数:10
相关论文
共 23 条
[1]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[2]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[3]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[4]   Short read fragment assembly of bacterial genomes [J].
Chaisson, Mark J. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2008, 18 (02) :324-330
[5]   De novo fragment assembly with short mate-paired reads: Does the read length matter? [J].
Chaisson, Mark J. ;
Brinza, Dumitru ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2009, 19 (02) :336-346
[6]   Human Genome Sequencing Using Unchained Base Reads on Self-Assembling DNA Nanoarrays [J].
Drmanac, Radoje ;
Sparks, Andrew B. ;
Callow, Matthew J. ;
Halpern, Aaron L. ;
Burns, Norman L. ;
Kermani, Bahram G. ;
Carnevali, Paolo ;
Nazarenko, Igor ;
Nilsen, Geoffrey B. ;
Yeung, George ;
Dahl, Fredrik ;
Fernandez, Andres ;
Staker, Bryan ;
Pant, Krishna P. ;
Baccash, Jonathan ;
Borcherding, Adam P. ;
Brownley, Anushka ;
Cedeno, Ryan ;
Chen, Linsu ;
Chernikoff, Dan ;
Cheung, Alex ;
Chirita, Razvan ;
Curson, Benjamin ;
Ebert, Jessica C. ;
Hacker, Coleen R. ;
Hartlage, Robert ;
Hauser, Brian ;
Huang, Steve ;
Jiang, Yuan ;
Karpinchyk, Vitali ;
Koenig, Mark ;
Kong, Calvin ;
Landers, Tom ;
Le, Catherine ;
Liu, Jia ;
McBride, Celeste E. ;
Morenzoni, Matt ;
Morey, Robert E. ;
Mutch, Karl ;
Perazich, Helena ;
Perry, Kimberly ;
Peters, Brock A. ;
Peterson, Joe ;
Pethiyagoda, Charit L. ;
Pothuraju, Kaliprasad ;
Richter, Claudia ;
Rosenbaum, Abraham M. ;
Roy, Shaunak ;
Shafto, Jay ;
Sharanhovich, Uladzislau .
SCIENCE, 2010, 327 (5961) :78-81
[7]   Single-molecule DNA sequencing of a viral genome [J].
Harris, Timothy D. ;
Buzby, Phillip R. ;
Babcock, Hazen ;
Beer, Eric ;
Bowers, Jayson ;
Braslavsky, Ido ;
Causey, Marie ;
Colonell, Jennifer ;
DiMeo, James ;
Efcavitch, J. William ;
Giladi, Eldar ;
Gill, Jaime ;
Healy, John ;
Jarosz, Mirna ;
Lapen, Dan ;
Moulton, Keith ;
Quake, Stephen R. ;
Steinmann, Kathleen ;
Thayer, Edward ;
Tyurina, Anastasia ;
Ward, Rebecca ;
Weiss, Howard ;
Xie, Zheng .
SCIENCE, 2008, 320 (5872) :106-109
[8]   Genome 10K: A Proposal to Obtain Whole-Genome Sequence for 10 000 Vertebrate Species [J].
Haussler, David ;
O'Brien, Stephen J. ;
Ryder, Oliver A. ;
Barker, F. Keith ;
Clamp, Michele ;
Crawford, Andrew J. ;
Hanner, Robert ;
Hanotte, Olivier ;
Johnson, Warren E. ;
McGuire, Jimmy A. ;
Miller, Webb ;
Murphy, Robert W. ;
Murphy, William J. ;
Sheldon, Frederick H. ;
Sinervo, Barry ;
Venkatesh, Byrappa ;
Wiley, Edward O. ;
Allendorf, Fred W. ;
Amato, George ;
Baker, C. Scott ;
Bauer, Aaron ;
Beja-Pereira, Albano ;
Bermingham, Eldredge ;
Bernardi, Giacomo ;
Bonvicino, Cibele R. ;
Brenner, Sydney ;
Burke, Terry ;
Cracraft, Joel ;
Diekhans, Mark ;
Edwards, Scott ;
Ericson, Per G. P. ;
Estes, James ;
Fjelsda, Jon ;
Flesness, Nate ;
Gamble, Tony ;
Gaubert, Philippe ;
Graphodatsky, Alexander S. ;
Graves, Jennifer A. Marshall ;
Green, Eric D. ;
Green, Richard E. ;
Hackett, Shannon ;
Hebert, Paul ;
Helgen, Kristofer M. ;
Joseph, Leo ;
Kessing, Bailey ;
Kingsley, David M. ;
Lewin, Harris A. ;
Luikart, Gordon ;
Martelli, Paolo ;
Moreira, Miguel A. M. .
JOURNAL OF HEREDITY, 2009, 100 (06) :659-674
[9]  
Idury R M, 1995, J Comput Biol, V2, P291, DOI 10.1089/cmb.1995.2.291
[10]  
Kececioglu John Dimitri, 1992, Ph.D. Dissertation