AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references

被引:51
作者
Bao, Ergude [1 ]
Jiang, Tao [1 ]
Girke, Thomas [2 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
[2] Univ Calif Riverside, Dept Bot & Plant Sci, Riverside, CA 92521 USA
基金
美国国家科学基金会;
关键词
SHORT DNA-SEQUENCES; SHORT READS; ARABIDOPSIS-THALIANA; DRAFT ASSEMBLIES; BRUIJN GRAPHS; PAIRED READS; ALIGNMENT; MILLIONS; QUALITY; TOOL;
D O I
10.1093/bioinformatics/btu291
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: De novo assemblies of genomes remain one of the most challenging applications in next-generation sequencing. Usually, their results are incomplete and fragmented into hundreds of contigs. Repeats in genomes and sequencing errors are the main reasons for these complications. With the rapidly growing number of sequenced genomes, it is now feasible to improve assemblies by guiding them with genomes from related species. Results: Here we introduce AlignGraph, an algorithm for extending and joining de novo-assembled contigs or scaffolds guided by closely related reference genomes. It aligns paired-end (PE) reads and pre-assembled contigs or scaffolds to a close reference. From the obtained alignments, it builds a novel data structure, called the PE multipositional de Bruijn graph. The incorporated positional information from the alignments and PE reads allows us to extend the initial assemblies, while avoiding incorrect extensions and early terminations. In our performance tests, AlignGraph was able to substantially improve the contigs and scaffolds from several assemblers. For instance, 28.7-62.3% of the contigs of Arabidopsis thaliana and human could be extended, resulting in improvements of common assembly metrics, such as an increase of the N50 of the extendable contigs by 89.9-94.5% and 80.3-165.8%, respectively. In another test, AlignGraph was able to improve the assembly of a published genome (Arabidopsis strain Landsberg) by increasing the N50 of its extendable scaffolds by 86.6%. These results demonstrate AlignGraph's efficiency in improving genome assemblies by taking advantage of closely related references.
引用
收藏
页码:319 / 328
页数:10
相关论文
共 39 条
[1]   Toward almost closed genomes with GapFiller [J].
Boetzer, Marten ;
Pirovano, Walter .
GENOME BIOLOGY, 2012, 13 (06)
[2]   Scaffolding pre-assembled contigs using SSPACE [J].
Boetzer, Marten ;
Henkel, Christiaan V. ;
Jansen, Hans J. ;
Butler, Derek ;
Pirovano, Walter .
BIOINFORMATICS, 2011, 27 (04) :578-579
[3]   Short read fragment assembly of bacterial genomes [J].
Chaisson, Mark J. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2008, 18 (02) :324-330
[4]   De novo fragment assembly with short mate-paired reads: Does the read length matter? [J].
Chaisson, Mark J. ;
Brinza, Dumitru ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2009, 19 (02) :336-346
[5]   SOPRA: Scaffolding algorithm for paired reads via statistical optimization [J].
Dayarian, Adel ;
Michael, Todd P. ;
Sengupta, Anirvan M. .
BMC BIOINFORMATICS, 2010, 11
[6]   SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
GENOME RESEARCH, 2007, 17 (11) :1697-1706
[7]   Opera: Reconstructing Optimal Genomic Scaffolds with High-Throughput Paired-End Sequences [J].
Gao, Song ;
Sung, Wing-Kin ;
Nagarajan, Niranjan .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2011, 18 (11) :1681-1691
[8]   High-quality draft assemblies of mammalian genomes from massively parallel sequence data [J].
Gnerre, Sante ;
MacCallum, Iain ;
Przybylski, Dariusz ;
Ribeiro, Filipe J. ;
Burton, Joshua N. ;
Walker, Bruce J. ;
Sharpe, Ted ;
Hall, Giles ;
Shea, Terrance P. ;
Sykes, Sean ;
Berlin, Aaron M. ;
Aird, Daniel ;
Costello, Maura ;
Daza, Riza ;
Williams, Louise ;
Nicol, Robert ;
Gnirke, Andreas ;
Nusbaum, Chad ;
Lander, Eric S. ;
Jaffe, David B. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (04) :1513-1518
[9]   Assisted assembly: how to improve a de novo genome assembly by using related species [J].
Gnerre, Sante ;
Lander, Eric S. ;
Lindblad-Toh, Kerstin ;
Jaffe, David B. .
GENOME BIOLOGY, 2009, 10 (08)
[10]   GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies [J].
Gritsenko, Alexey A. ;
Nijkamp, Jurgen F. ;
Reinders, Marcel J. T. ;
de Ridder, Dick .
BIOINFORMATICS, 2012, 28 (11) :1429-1437