Readjoiner: a fast and memory efficient string graph-based sequence assembler

被引:40
作者
Gonnella, Giorgio [1 ]
Kurtz, Stefan [1 ]
机构
[1] Univ Hamburg, Ctr Bioinformat, D-20146 Hamburg, Germany
关键词
LARGE GENOMES;
D O I
10.1186/1471-2105-13-82
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Ongoing improvements in throughput of the next-generation sequencing technologies challenge the current generation of de novo sequence assemblers. Most recent sequence assemblers are based on the construction of a de Bruijn graph. An alternative framework of growing interest is the assembly string graph, not necessitating a division of the reads into k-mers, but requiring fast algorithms for the computation of suffix-prefix matches among all pairs of reads. Results: Here we present efficient methods for the construction of a string graph from a set of sequencing reads. Our approach employs suffix sorting and scanning methods to compute suffix-prefix matches. Transitive edges are recognized and eliminated early in the process and the graph is efficiently constructed including irreducible edges only. Conclusions: Our suffix-prefix match determination and string graph construction algorithms have been implemented in the software package Readjoiner. Comparison with existing string graph-based assemblers shows that Readjoiner is faster and more space efficient. Readjoiner is available at http://www.zbh.uni-hamburg.de/readjoiner.
引用
收藏
页数:19
相关论文
共 30 条
[1]  
Abouelhoda M. I., 2004, Journal of Discrete Algorithms, V2, P53, DOI 10.1016/S1570-8667(03)00065-0
[2]  
[Anonymous], 1990, Introduction to Algorithms
[3]   Plantagora: Modeling Whole Genome Sequencing and Assembly of Plant Genomes [J].
Barthelson, Roger ;
McFarlin, Adam J. ;
Rounsley, Steven D. ;
Young, Sarah .
PLOS ONE, 2011, 6 (12)
[4]   ENGINEERING A SORT FUNCTION [J].
BENTLEY, JL ;
MCILROY, MD .
SOFTWARE-PRACTICE & EXPERIENCE, 1993, 23 (11) :1249-1265
[5]  
Boetzer M., 2010, Bioinformatics
[6]  
Bowden T, 2009, PROC FILESYSTEM
[7]   Short read fragment assembly of bacterial genomes [J].
Chaisson, Mark J. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2008, 18 (02) :324-330
[8]  
Chikhi R., 2011, LECT NOTES COMPUTER, P5
[9]   A memory-efficient data structure representing exact-match overlap graphs with application for next-generation DNA assembly [J].
Dinh, Hieu ;
Rajasekaran, Sanguthevar .
BIOINFORMATICS, 2011, 27 (14) :1901-1907
[10]  
Donmez N, 2011, LECT N BIOINFORMAT, V6577, P38, DOI 10.1007/978-3-642-20036-6_5