Maximum Likelihood Genome Assembly

被引:63
作者
Medvedev, Paul [1 ]
Brudno, Michael [1 ,2 ]
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[2] Univ Toronto, Donnelly Ctr Cellular & Biomol Res, Toronto, ON, Canada
关键词
bidirected flow; bidirected graph; Chinese postman; de Bruijn graph; genome assembly; matepairs; sequence assembly; SHORT DNA-SEQUENCES; SHORT READS; ALGORITHMS; COMPUTER; MILLIONS;
D O I
10.1089/cmb.2009.0047
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Whole genome shotgun assembly is the process of taking many short sequenced segments ( reads) and reconstructing the genome from which they originated. We demonstrate how the technique of bidirected network flow can be used to explicitly model the double-stranded nature of DNA for genome assembly. By combining an algorithm for the Chinese Postman Problem on bidirected graphs with the construction of a bidirected de Bruijn graph, we are able to find the shortest double-stranded DNA sequence that contains a given set of k-long DNA molecules. This is the first exact polynomial time algorithm for the assembly of a double-stranded genome. Furthermore, we propose a maximum likelihood framework for assembling the genome that is the most likely source of the reads, in lieu of the standard maximum parsimony approach ( which finds the shortest genome subject to some constraints). In this setting, we give a bidirected network flow-based algorithm that, by taking advantage of high coverage, accurately estimates the copy counts of repeats in a genome. Our second algorithm combines these predicted copy counts with matepair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from Escherichia coli and predict copy counts with extremely high accuracy, while assembling long contigs.
引用
收藏
页码:1101 / 1116
页数:16
相关论文
共 27 条
[1]  
Ahuja R. K., 1993, Network Flows: Theory, Algorithms, and Applications
[2]  
[Anonymous], 1976, COMBINATORIAL OPTIMI
[3]  
[Anonymous], COMBINATORIAL OPTI A
[4]  
[Anonymous], 1967, LECT NOTES
[5]   A bidirected generalization of network matrices [J].
Appa, Gautam ;
Kotnyek, Balazs .
NETWORKS, 2006, 47 (04) :185-198
[6]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[7]   Fragment assembly with short reads [J].
Chaisson, M ;
Pevzner, P ;
Tang, HX .
BIOINFORMATICS, 2004, 20 (13) :2067-2074
[8]   Short read fragment assembly of bacterial genomes [J].
Chaisson, Mark J. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2008, 18 (02) :324-330
[9]  
Chen J., 2007, ADV GENOME SEQUENCIN, P123
[10]   SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
GENOME RESEARCH, 2007, 17 (11) :1697-1706